CFP last date
20 March 2024
Reseach Article

Article:Performance Analysis of Various Data Mining Algorithms:A Review

by Dharminder Kumar, Suman
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 32 - Number 6
Year of Publication: 2011
Authors: Dharminder Kumar, Suman
10.5120/3906-5476

Dharminder Kumar, Suman . Article:Performance Analysis of Various Data Mining Algorithms:A Review. International Journal of Computer Applications. 32, 6 ( October 2011), 9-16. DOI=10.5120/3906-5476

@article{ 10.5120/3906-5476,
author = { Dharminder Kumar, Suman },
title = { Article:Performance Analysis of Various Data Mining Algorithms:A Review },
journal = { International Journal of Computer Applications },
issue_date = { October 2011 },
volume = { 32 },
number = { 6 },
month = { October },
year = { 2011 },
issn = { 0975-8887 },
pages = { 9-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume32/number6/3906-5476/ },
doi = { 10.5120/3906-5476 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:18:27.361914+05:30
%A Dharminder Kumar
%A Suman
%T Article:Performance Analysis of Various Data Mining Algorithms:A Review
%J International Journal of Computer Applications
%@ 0975-8887
%V 32
%N 6
%P 9-16
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data warehouse is the essential point of data combination for business intelligence. Now days, there has been emerging trends in database to discover useful patterns and/or correlations among attributes, called data mining. This paper presents the data mining techniques like Classification, Clustering and Associations Analysis which include algorithms of Decision Tree (like C4.5), Rule set Classifier ,kNN and Naïve Bayes ,Clustering algorithms (like k-Means and EM )Machine Learning (Like SVM),Association Analysis(like Apriori). These algorithms are applied on data warehouse for extracting useful information. All algorithms contain their description, impact and review of algorithm. We also show the comparison between the classifiers by accuracy which shows ruleset classifier have higher accuracy when implement in weka.These algorithms useful in increasing sales and performance of industries like banking, insurance, medical etc and also detect fraud and intrusion for assistance of society.

References
  1. P.Ponniah, “Data Warehousing Fundamentals- “A comprehensive guide for IT professionals”, Ist ed.,second reprint , ISBN-81-265-0919-8, Glorious Printers: New Delhi India, 2007.
  2. An Introduction to Data Mining,Review: http://www.thearling.com/text/dmwhite/dmwhite.htm
  3. A Tutorial on Clustering Algorithms, Review http://home.dei.polimi.it/matteucc/Clustering/tutorial_html
  4. Naive Bayes Classifier Review: http://www.statsoft.com/textbook/naive-bayes-classifier/
  5. Pang-Ning Tan,Michael Steinbach,Vipin Kumar, “An Introduction to Data Mining”, ISBN : 0321321367. Addison-Wesley, 2005 .
  6. XindongWu • Vipin Kumar et all, “Top 10 algorithms in data mining” Knowl Inf Syst (2008) 14:1–37 DOI 10.1007/s10115-007-0114-2
  7. Murthy, S. & Salzberg, S. (1995), Lookahead and pathology in decision tree induction,in C. S. Mellish, ed., `Proceedings of the 14th International Joint Conference on Articial Intelligence', Morgan Kaufmann, pp. 1025-1031.
  8. Jiawei Han, Micheline Kamber,” Data Mining:Concepts and Techniques, Second Edition, ISBN 13: 978-1-55860-901-3, Elsevier,2006.
  9. Utgo, P. E. (1997), `Decision tree induction based on efficient tree restructuring', Machine Learning 29, 5.
  10. Esposito, F., Malerba, D. & Semeraro, G. (1995), Simplifying decision trees by prun-ing and grafting: New results, in N. Lavrac & S. Wrobel, eds, `Machine Learning:ECML-95 (Proc. European Conf. on Machine Learning, 1995)', Lecture Notes in Articial Intelligence 914, Springer Verlag, Berlin, Heidelberg, New York,pp. 287-290.
  11. Shafer, J., Agrawal, R. & Mehta, M. (1996), Sprint: a scalable prallel classier for data mining, in `Proceedings of the 22nd International Conference on Very Large Databases (VLDB)'.
  12. Freitas, A. A. & Lavington, S. H. (1998), Mining Very Large Databases with Parallel Processing, Kluwer Academic Publishers.
  13. Kearns, M. & Mansour, Y. (1998), A fast, bottom-up decision tree pruning algorithm with near-optimal generalization, in J. Shavlik, ed., `Machine Learning: Proceedings of the Fifteenth International Conference', Morgan Kaufmann Publishers,Inc., pp. 269-277.
  14. Friedman, J., Kohavi, R. & Yun, Y. (1996), Lazy decision trees, in `Proceedings of the Thirteenth National Conference on Articial Intelligence', AAAI Press and the MIT Press, pp. 717-724.
  15. Quinlan, J. R. & Rivest, R. L. (1989), `Inferring decision trees using the minimum description length principle', Information and Computation 80, 227-248.
  16. Mehta, M., Rissanen, J. & Agrawal, R. (1995), MDL-based decision tree pruning,in U. M. Fayyad & R. Uthurusamy, eds, `Proceedings of the first international conference on knowledge discovery and data mining', AAAI Press, pp. 216-221.
  17. Wallace, C. & Patrick, J. (1993), `Coding decision trees', Machine Learning 11, 7-22.
  18. Biao Qin, Yuni Xia et al. “A Rule-Based Classification Algorithm for Uncertain Data” IEEE International Conference on Data Engineering 2009, pp 1633-1640.
  19. Jiuyong Li et al. Construct robust rule sets for classification, SIGKDD ’02 Edmonton, Alberta, Canada.
  20. Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In SIGIR-94, 1994.
  21. E. Fix and J. L. Hodges, Jr., "Discriminatory analysis, nonparametric discrimination: consistency properties," U.S. Air Force Sch. Aviation Medicine, Randolf Field, Tex., Project 21-49-004, Contract AF 41(128)-31, Rep. 4, Feb. 1951.
  22. S. Cost and S. Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning,10(1):57–78, 1993.
  23. Eui-Hong (Sam) Han et al.,” Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification.
  24. E. A. Patrick and F. P. Fischer, III, "A generalized k-nearest neighbor rule," Inform. Contr., vol. 16, pp. 128-152, Apr. 1970.
  25. W.W. Cohen and H. Hirsh. Joins that generalize: Text classification using WHIRL. In Proc. of the Fourth Int’l Conference on Knowledge Discovery and Data Mining, 1998.
  26. Dennis l. Wilson, Asymptotic properties of nearest neighbor rules using edited data” IEEE transactions on systems, man, and cybernetics, vol. Smc-2, no. 3, july 1972.
  27. Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 223–228). San Jose, CA: AAAI Press.
  28. Langley, P., &Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 399–406). Seattle, WA: Morgan Kaufmann.
  29. Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283.
  30. Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. Proceedings of the Ninth European Conference on Artificial Intelligence. Stockholm, Sweden: Pitman.
  31. Pazzani, M., Muramatsu, J.,&Billsus, D. (1996). Syskill&Webert: Identifying interesting web sites. Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 54–61). Portland, OR: AAAI Press.
  32. John, G., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 338–345) . Montr´eal, Canada: Morgan Kaufmann.
  33. Kubat, M., Flotzinger, D., &Pfurtscheller, G. (1993). Discovering patterns in EEG-Signals: Comparative study of a few methods. Proceedings of the Eighth European Conference on Machine Learning (pp. 366–371). Vienna, Austria: Springer-Verlag.
  34. Langley, P. (1993). Induction of recursive Bayesian classifiers. Proceedings of the Eighth European Conference on Machine Learning (pp. 153–164). Vienna, Austria: Springer-Verlag.
  35. O.M. San et al.,” An alternative extension of the k-means algorithm for clustering categorical data” Int. J. Appl. Math. Comput. Sci., 2004, Vol. 14, No. 2, 241–247.
  36. MacQueen J.B. (1967): Some methods for classification and analysis of multivariate observations.—Proc. 5-th Symp.Mathematical Statistics and Probability, Berkelely, CA,Vol. 1, pp. 281–297.
  37. Ralambondrainy H. (1995): A conceptual version of the kmeans algorithm. — Pattern Recogn. Lett., Vol. 15, No. 11, pp. 1147–1157.
  38. Huang Z. (1998): Extensions to the k-means algorithm for clustering large data sets with categorical values. — Data Mining Knowl. Discov., Vol. 2, No. 2, pp. 283–304.
  39. W. K. Ngai, B. Kao, C. K. Chui, R. Cheng, M. Chau, and K. Y. Yip, “Efficient clustering of uncertain data,” in IEEE International Conference on Data Mining (ICDM) 2006, pp. 436–445.
  40. M. Chau, R. Cheng, B. Kao, and J. Ng, “Data with uncertainty mining: An example in clustering location data,” in Proc. of the Methodologies for Knowledge Discovery and Data Mining, Pacific-Asia Conference (PAKDD 2006), 2006.
  41. A. C, “On density based transforms for uncertain data mining,” in Proceedings of IEEE 23rd International Conference on Data Engineering,2007, pp. 866–875.
  42. A. C and Y. PS, “A framework for clustering uncertain data streams,” in Proceedings of IEEE 24rd International Conference on Data Engineering, 2008, pp. 150–159.
  43. Y. Xia and B. Xi, “Conceptual clustering categorical data with uncertainty,” in IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2007, pp. 329–336.
  44. Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38.
  45. McLachlan, G.J. and Krishnan, T. (1997). The EM Algorithm and Extensions. Wiley, New York.
  46. Neal, R.M. and Hinton, G.E. (1998). A view of the EM algorithm that justifies incremental,sparse, and other variants. In Jordan, M.I., editor, Learning in Graphical Models, pages 355–368. Kluwer, Dordrecht.
  47. Bradley, P.S., Fayyad, U.M., and Reina, C.A. (1998). Scaling EM (expectation maximization) clustering to large databases. Technical Report No. MSR-TR-98- 35 (revised February, 1999), Microsoft Research, Seattle.
  48. Moore, A.W. (1999). Very fast EM-based mixture model clustering using multiresolution kd-trees. In Kearns, M.S., Solla, S.A., and Cohn, D.A., editors, Advances in Neural Information Processing Systems 11, pages 543–549. MIT Press, MA.
  49. B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, 5th Annual ACM Workshop on COLT, pages 144{152, Pittsburgh, PA,1992. ACM Press.
  50. C-C. Chang and C-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  51. V. Vapnik, S. Golowich, and A. Smola. Support vector method for function approximation, regression estimation, and signal processing. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 281–287, Cambridge, MA, 1997. MIT Press.
  52. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, Washington, D.C., May 1993.
  53. Rakesh Agrawal Ramakrishnan Srikan,” Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference Santiago, Chile, 1994
  54. Z. Yu and H. Wong, “Mining uncertain data in low-dimensional subspace,” in International Conference on Pattern Recognition (ICPR) 2006,pp. 748–751.
  55. C. Chui, B. Kao, and E. Hung, “Mining frequent itemsets from uncertain data,” in Proc. of the Methodologies for Knowledge Discovery and Data Mining, Pacific-Asia Conference (PAKDD) 2007, pp. 47–58.
  56. M. Houtsma and A. Swami. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Center, San Jose, California, October 1993.
Index Terms

Computer Science
Information Sciences

Keywords

Decision Tree Rule set Classifier kNN Naïve Bayes k-Means EM SVM Apriori