CFP last date
20 May 2024
Reseach Article

Hierarchical Document Clustering: A Review

Published on November 2011 by Ashish Jaiswal, Prof. Nitin Janwe
2nd National Conference on Information and Communication Technology
Foundation of Computer Science USA
NCICT - Number 3
November 2011
Authors: Ashish Jaiswal, Prof. Nitin Janwe
03233b8b-4cbc-4c24-afed-f0ba889ce7bd

Ashish Jaiswal, Prof. Nitin Janwe . Hierarchical Document Clustering: A Review. 2nd National Conference on Information and Communication Technology. NCICT, 3 (November 2011), 37-41.

@article{
author = { Ashish Jaiswal, Prof. Nitin Janwe },
title = { Hierarchical Document Clustering: A Review },
journal = { 2nd National Conference on Information and Communication Technology },
issue_date = { November 2011 },
volume = { NCICT },
number = { 3 },
month = { November },
year = { 2011 },
issn = 0975-8887,
pages = { 37-41 },
numpages = 5,
url = { /proceedings/ncict/number3/4294-ncict024/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 2nd National Conference on Information and Communication Technology
%A Ashish Jaiswal
%A Prof. Nitin Janwe
%T Hierarchical Document Clustering: A Review
%J 2nd National Conference on Information and Communication Technology
%@ 0975-8887
%V NCICT
%N 3
%P 37-41
%D 2011
%I International Journal of Computer Applications
Abstract

As text documents are largely increasing in the internet, the process of grouping similar documents for versatile applications have put the eye of researchers in this area. However most clustering methods suffer from challenges in dealing with problems of high dimensionality, scalability, accuracy and meaningful cluster labels. This paper presents a review on all these well known methods of document clustering. Hierarchical document clustering method is explained in detail. Study shows that hierarchical document clustering performs well but still there is a scope to improve above mentioned problems.

References
  1. Benjamin C. M. Fung, Ke Wang, and Martin Ester, Simon Fraser University, Canada, “Hierarchical Document Clustering”.
  2. B. Fung, K. Wang, and M. Ester, “Hierarchical document clustering using frequent itemsets”, In Proc. SIAM International Conference on Data Mining, 2003, pp. 59-70.
  3. M. Steinbach, G. Karypis, and V. Kumar, "A comparison of document clustering techniques", KDD Workshop on Text Mining'00, 2000Tavel, P. 2007 Modeling and Simulation Design. AK Peters Ltd.
  4. Beil, M. Ester, and X. Xu, "Frequent term-based text clustering". In Proc. 8th Int. Conf. on Knowledge Discovery and Data Mining (KDD)'2002, Alberta, Canada, 2002.
  5. Hassan H. Malik and John R. Kender, “High Quality, Efficient Hierarchical Document Clustering using Closed Interesting Itemsets”, In the proceedings of IEEE International Conference on Data Mining (ICDM’06).
  6. Arnaud Ribert, Abdel Ennaji, Yves Lecourtier,”An incremental Hierarchical Clustering” vision interface’99, Trios- Rivieres, Canada, 19-21 May.
  7. Chun-Ling Chen, Frank S.C. Tseng, Tyne Liang “Mining Fuzzy Frequent Item sets for Hierarchical Document Clustering” International Journal of Information Processing and Management 46 (2010) 193-211
  8. Anuj Sharma, Renu Dhir,”A Wordsets Based Document Clustering Algorithm for Large Datasets”, International Conference on Methods and Models in Computer Science, 2009.
  9. Xiaoke Su, Yang Lan, Renxia Wan and Yuming Qin “A fast Incremental Clustering Algorithm”, Proceedings of the 2009 International Sumposium on Information Processing (ISIP’09), Huangshan, P.R. China, August 21-23, 2009, pp. 175-178.
  10. M. shriniwas and C. Krishna Mohan “Efficient Clustering Approach using Incremental and Hierarchical Clustering Methods”,2010 IEEE
  11. Rekha Baghel, Dr. Renu Dhir, “A Frequent Concept Based Document Clustering Algorithm”, International Journal of Computer Applications(0975-8887) vol.4- No. 5 July 2010.
  12. Chun- Ling Chen, Frank S.C. Tseng, Tyne Liang “Hierarchical Document Clustering Using Fuzzy Association Rule Mining”, the third international conference on Innovative Computing Information and Control (ICICIC’08), 2008 IEEE.
  13. A.K Jain, M.N. Murty and PJ .Flynn, "Data clustering: a review", ACM Computing Surveys, vol. 3I (3), pp 264- 323,1999.
  14. X. Rui, "Survey of clustering algorithms", IEEE Transactions on Neural Networks, vol 16(3), pp. 634-678, 2005
  15. R. Agrawal and R. Srikant, "Fast algorithm for mining association rules". In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487-499. Morgan Kaufmann, 12-15 1994.
Index Terms

Computer Science
Information Sciences

Keywords

Document clustering Hierarchical clustering Frequent item sets