CFP last date
20 May 2024
Reseach Article

Data Mining Methods: A Review

by Dimitrios Papakyriakou, Ioannis S. Barbounakis
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 183 - Number 48
Year of Publication: 2022
Authors: Dimitrios Papakyriakou, Ioannis S. Barbounakis
10.5120/ijca2022921884

Dimitrios Papakyriakou, Ioannis S. Barbounakis . Data Mining Methods: A Review. International Journal of Computer Applications. 183, 48 ( Jan 2022), 5-19. DOI=10.5120/ijca2022921884

@article{ 10.5120/ijca2022921884,
author = { Dimitrios Papakyriakou, Ioannis S. Barbounakis },
title = { Data Mining Methods: A Review },
journal = { International Journal of Computer Applications },
issue_date = { Jan 2022 },
volume = { 183 },
number = { 48 },
month = { Jan },
year = { 2022 },
issn = { 0975-8887 },
pages = { 5-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume183/number48/32253-2022921884/ },
doi = { 10.5120/ijca2022921884 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:20:03.114121+05:30
%A Dimitrios Papakyriakou
%A Ioannis S. Barbounakis
%T Data Mining Methods: A Review
%J International Journal of Computer Applications
%@ 0975-8887
%V 183
%N 48
%P 5-19
%D 2022
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The Big Data revolution is taking place due to the evolution of technology, where the technology enables firms to gather extremely huge amount of data, disseminating knowledge to their customers, partners, competitors in the marketplace [1]. The deeper we dive into technology, the more we compound the physical with the virtual world having in mind for instance the IoT (Internet of Things) as a network of physical devices connected together and able to exchange data. There are many Big Data platforms a company can choose like Hadoop and Apache Spark to analyze large sets of data.Moreover, many data mining techniques like Classification, Clustering Analysis, Correlation Analysis, Decision Tree Induction, Regression Analysis can be used to identify patterns for knowledge discovery. In this paper, there is an extent review and summary of Big Data Mining techniqueswith the most common data mining algorithms suitable to be used to handle large datasets. The review depicts the general pros and cons of these algorithms and the correspondingappropriate fields that apply, and in general acts as a guideline to data mining researchers to have an outlook on what algorithms to choose based on their needs and based on the given datasets.

References
  1. X. Zhu, B. Song, Y. Ni, Y. Ren, R. Li, (2016). Business Trends in the Digital Era:Evolution of Theories and Applications, Springer.
  2. Laney, D. (2001) 3D Data Management: Controlling Data Volume, Velocity and Variety. META Group Research Note, 6.
  3. McAfee, A. and Brynjolfsson, E. (2012). Big Data. The Management Revolution. Harvard Business Review, 90(10), pp. 60–9.
  4. Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2024. Statista 2020. [Online]. Available:https://www.statista.com/statistics/871513/worldwide-data-created/.
  5. Brands, K. (2014). Big Data and Business Intelligence for Management Accountants. Strategic Finance, 96(6), pp. 64–5.
  6. Gandomi, A. and Haider, M. (2015). Beyond the hype: Big Data concepts, methods, and analysis. International Journal of Information Management, 35(2), pp. 137–44.
  7. Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A. and Khan, S.U. (2015). The rise of "Big Data" on cloud computing: Review and open research issues. Information Systems, 47(1), pp. 98–115.
  8. Bendler, J., Wagner, S., Brandt, T. and Neumann, D. (2014). Taming uncertainty in Big Data: Evidence from social media in Urban Areas. Business & Information Systems Engineering, 6(5), pp. 279–88
  9. Ishwarappa, K. and Anuradha, J. (2015). A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology. Procedia Computer Science, 48(1), pp. 319–324.
  10. Trupti, A. Kumbhare, and Santosh, V. Chobe, (2014). An Overview of Association Rule Mining Algorithms, International Journal of Computer Science and Information Technologies, Vol.5(1), pp. 927-930.
  11. Sudhir, M. Gorade, Ankit Deo and Pritesh Purohit, (2017). A Study of Some Data Mining Classification Techniques. International Research Journal of Engineering and Technology. Vol. 4, Issue. 4, pp. 3112-3115.
  12. J. Han, M. Kamber and J. Pei, J (2010). Data Mining Concepts and Techniques (3rd ed.) University ofIllinois.Chapter 8, pp. 99-117.
  13. Duda RO, Hart PE, and Stork DG, (2000). Pattern classification, 2nd ed. New York: John Wiley & Sons.
  14. Rao, R. P. N., & Scherer, R. (2010). Statistical Pattern Recognition and Machine Learning in Brain-Computer Interfaces. In Statistical Signal Processing for Neuroscience and Neurotechnology (1 ed., pp. 335-368). Elsevier B.V.
  15. Auria, Laura and Moro, R. A., Support Vector Machines (SVM) as a Technique for Solvency Analysis (August 1, 2008). DIW Berlin Discussion Paper No. 811, Available atSSRN: https://ssrn.com/abstract=1424949.
  16. S. Karamizadeh, S. M. Abdullah, M. Halimi, J. Shayan and M. j. Rajabi, (2014). "Advantage and drawback of support vector machine functionality," 2014 International Conference on Computer, Communications, and Control Technology (I4CT), pp. 63-65, doi: 10.1109/I4CT.2014.6914146.
  17. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone (1984). Classification and Regression Trees. Chapman & Hall, New York, NY.
  18. S. K. Murthy, S. Kasif, and S. Salzberg, (1994). A system for induction of oblique decision trees. J. Artif. Int. Res., 2(1):1–32.
  19. J. Quinlan, (1986). Induction of decision trees. Machine Learning, 1(1):81–106.
  20. J. Quinlan, (1993). Morgan Kaufmann,C4.5: Programs for Machine Learning.
  21. Mean Squared Error (MSE). [Online]. Available: https://www.probabilitycourse.com/chapter9/9_1_5_mean_squared_error_MSE.php
  22. Nova, D., Estévez, P.A. (2014). A review of learning vector quantization classifiers. Neural Comput & Applic 25, 511–524,https://doi.org/10.1007/s00521-013-1535-3
  23. D. Nova and P. Estevez, (2013). “A Review of Learning Vector Quantization Classifiers,” Neural Computing and Applications, vol. 25, pp. 511–524.
  24. A. Priyono, M. Ridwan, A. J. Alias, R. A. O. Rahmat, A. Hassan, and M. A. M. Ali, (2012). “Application of LVQ neural network in realtime adaptive traffic signal control,” Jurnal Teknologi, vol. 42, no. 1, pp. 29–44.
  25. Y. Freund, (1995). “Boosting a weak learning algorithm by majority”, Information and computation. 121(2):256–285.
  26. Y. Freund and R.E. Schapire, (1999). “A short introduction to boosting” Journal of Japanese Society for Artificial Intelligence, 14(5):771-780.
  27. Huh, Myung-Hoe, & Lee, Yonggoo. (2006). “LMS and LTS-type Alternatives to Classical Principal Component Analysis”. Communications for Statistical Applications and Methods, 13 (2), 233–241. https://doi.org/10.5351/CKSS.2006.13.2.233
  28. R. Agrawal and R. Srikant., (March 1995).“Mining Sequential Patterns”. In Proc. of the 11th Int'l Conference on Data Engineering, Taipei, Taiwan.
  29. Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R. (2017).“A survey of sequential pattern mining”. Data Sci. Pattern Recogn. s1, 54–77.
  30. Thabet Slimani, and Amor Lazzez. (2013). “Sequential Mining: Patterns and Algorithms Analysis”, International Journal of Computer and Electronics Research, Volume 2, Issue 5, pp 639-647.
  31. Mooney, C. H. & Roddick, J. F., (Feb 2013) “Sequential Pattern Mining — Approaches and Algorithms”, ACM Computing Surveys, vol. 45, no. 2, pp. 1–39, DOI: 10.1145/2431211.2431218.
  32. Kum, H.-C., Chang, J. H., & Wang, W. (2006). “Sequential Pattern Mining in MultiDatabases via Multiple Alignment”. Data Min. Knowl. Discov., 12(2-3), 151-180.
  33. S. Anitha Elavaras, (Jan 2011). “A Survey on Partitional Clustering Algorithm”, International Journal of Enterprise Computing and Business Systems, Vol. 1 Issue 1.
  34. Kaufman, L., & Rousseeuw, P. J., (1990). “Finding groups in data: an introduction to cluster analysis.” New York, Wiley.
  35. T. Soni Madhulatha. (April 2012). “An overview on Clustering Methods”. IOSR Journal of Engineering., Vol. 2(4) pp: 719-725.
  36. Ester, M., Kriegel, H.P., Sander, J., Xu, X. (1996). “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. In Proc. KDD.
  37. Dimitrios Papakyriakou, Dimitra Kottou and Ioannis Kostouros. (April 2018). “Benchmarking Raspberry Pi 2 Beowulf Cluster. International Journal of Computer Applications” 179(32):21-27.
  38. Dimitrios Papakyriakou. (August 2019). “Benchmarking Raspberry Pi 2 Hadoop Cluster”. International Journal of Computer Applications 178(42):37-47.
Index Terms

Computer Science
Information Sciences

Keywords

Big Data Big Data Analytics Data Mining Algorithms Data Clustering