CFP last date
20 May 2024
Reseach Article

Dimensionality Reduction using Clustering Technique

Published on March 2017 by Snehal D.borase, Satish S.banait
Emerging Trends in Computing
Foundation of Computer Science USA
ETC2016 - Number 4
March 2017
Authors: Snehal D.borase, Satish S.banait
991cb5fc-3b3a-4144-9278-bbaba1c581fa

Snehal D.borase, Satish S.banait . Dimensionality Reduction using Clustering Technique. Emerging Trends in Computing. ETC2016, 4 (March 2017), 17-22.

@article{
author = { Snehal D.borase, Satish S.banait },
title = { Dimensionality Reduction using Clustering Technique },
journal = { Emerging Trends in Computing },
issue_date = { March 2017 },
volume = { ETC2016 },
number = { 4 },
month = { March },
year = { 2017 },
issn = 0975-8887,
pages = { 17-22 },
numpages = 6,
url = { /proceedings/etc2016/number4/27323-6276/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 Emerging Trends in Computing
%A Snehal D.borase
%A Satish S.banait
%T Dimensionality Reduction using Clustering Technique
%J Emerging Trends in Computing
%@ 0975-8887
%V ETC2016
%N 4
%P 17-22
%D 2017
%I International Journal of Computer Applications
Abstract

Clustering is a method of finding homogeneous classes of the known objects. Clustering plays a major role in various applications in data mining such as, computational biology, medical diagnosis, information recovery, CRM, scientific data investigation, selling, and web analysis. Most of the researchers have a major interest in designing clustering algorithms. "Big data" involves terabytes and petabytes of data. Big data is challenging because of its five important characteristics such as volume, velocity, variety, variability and complexity. Therefore big data is difficult to handle using conventional tools and techniques. There are so many issues in clustering techniques, so some of the issues is how to process the data and big data is clustered in more compact format, Clustering algorithm suffer from stability problem, ensemble of single and multi level clustering. An important issue in clustering is that we do not have earlier knowledge regarding data. Also selection of input parameters such as number of nearest neighbours, number of clusters in these algorithms makes clustering a challenging task. The main objective is to study and analyze the existing clustering algorithms, impact of dimensionality reduction and dealing with outliers.

References
  1. J. MacQueen, ''Some methods for classi?cation and analysis of multivariate observations,'' in Proc. 5th Berkeley Symp. Math. Statist. Probab. , Berkeley, CA, USA, 1967, pp. 281–297.
  2. A. P. Dempster; N. M. Laird; D. B. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1. (1977), pp. 1-38
  3. J. C. Bezdek, R. Ehrlich, and W. Full,''FCM: Thefuzzy c-means clustering algorithm,'' Comput. Geosci. , vol. 10, nos. 2–3, pp. 191–203, 1984.
  4. D. H. Fisher, ''Knowledge acquisition via incremental conceptual clustering,'' Mach. Learn. , vol. 2, no. 2, pp. 139–172, Sep. 1987.
  5. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data. Upper Saddle River, NJ, USA: Prentice-Hall, 1988.
  6. R. T. Ng andJ. Han, ''Ef?cient and effective clustering methods for spatial data mining,'' in Proc. Int. Conf. Very Large Data Bases (VLDB), 1994, pp. 144–155
  7. T. Zhang, R. Ramakrishnan, and M. Livny, ''BIRCH: An ef?cient data clustering method for very large databases,''in Proc. ACMSIGMOD Rec. , Jun. 1996, vol. 25, no. 2, pp. 103–114
  8. Ester M. , Kriegel H. -P. , Sander J. , Xu X. : "A Density- Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2cnd Int. Conf. On Knowledge Discovery and Data Mining, Portland, Oregon, 1996, AAAI Press, 1996.
  9. Z. Huang,''A fast clustering algorithm to cluster very large categorical datasets in data mining,''in Proc. SIGMOD Workshop Res. Issues Data Mining Knowl. Discovery, 1997, pp. 1–8.
  10. X. Xu, M. Ester, H. -P. Kriegel, and J. Sander, ''A distribution-based clustering algorithm fo rmining in large spatial ldatabases,''inProc. 14thIEEE Int. Conf. Data Eng. (ICDE), Feb. 1998, pp. 324–331.
  11. S. Guha, R. Rastogi, and K. Shim, ''CURE: An efficient clustering algorithm For large databases,''in Proc. CMSIGMOD Rec. , Jun. 1998,vol. 27,no. 2, pp. 73–84
  12. G. Sheikholeslami, S. Chatterjee, and A. Zhang, ''Wavecluster: A multi resolution clustering approach for very large spatial databases,'' in Proc. Int. Conf. Very Large Data Bases (VLDB), 1998, pp. 428–439.
  13. A. Hinneburg and D. A. Keim, ''Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering,'' in Proc. 25th Int. Conf. Very Large Data Bases (VLDB), 1999, pp. 506–517.
  14. G. Karypis,E. H. Han,andV. Kumar,''Chameleon:Hierarchicalclustering using dynamic modelling,'' IEEE Comput. , vol. 32, no. 8, pp. 68–75, Aug. 1999.
  15. S. Guha, R. Rastogi, and K. Shim, ''Rock: A robust clustering algorithm for categorical attributes,'' Inform. Syst. , vol. 25,no. 5,pp. 345–366,2000.
  16. R. T. Ng and J. Han, ''CLARANS: A method for clustering objects for spatial data mining,''IEEE Trans. Knowl. Data Eng. (TKDE),vol. 14,no. 5, pp. 1003–1016, Sep. /Oct. 2002.
  17. A. N. Mahmood, C. Leckie, and P. Udaya, ''ECHIDNA: Ef?cient clustering of hierarchical data for network traf?c analysis,'' in Proc. 5th Int. IFIP-TC6 Conf. Netw. Technol. , Services, Protocols Perform. Comput. Commun. Netw. Mobile Wireless Commun. Syst. (NETWORKING), 2006, pp. 1092–1098.
  18. A. Fahad, N. alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, and A. Bouras, "A survey of clustering algorithms for big data: taxonomy and empirical analysis", IEEE Transactions on emerging topics in computing, vol 2, no. 3, Sept 2014.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Algorithms Big Data Nearest Neighbours Outliers