Call for Paper - November 2022 Edition
IJCA solicits original research papers for the November 2022 Edition. Last date of manuscript submission is October 20, 2022. Read More

Recent Developments in Text Clustering Techniques

International Journal of Computer Applications
© 2012 by IJCA Journal
Volume 37 - Number 6
Year of Publication: 2012
Saurabh Sharma
Vishal Gupta

Saurabh Sharma and Vishal Gupta. Article: Recent Developments in Text Clustering Techniques. International Journal of Computer Applications 37(6):14-19, January 2012. Full text available. BibTeX

	author = {Saurabh Sharma and Vishal Gupta},
	title = {Article: Recent Developments in Text Clustering Techniques},
	journal = {International Journal of Computer Applications},
	year = {2012},
	volume = {37},
	number = {6},
	pages = {14-19},
	month = {January},
	note = {Full text available}


In order to make better business decisions, faster database browsing and reducing processing time of queries, Extraction of Information from text documents in efficient manner is needed. Clustering of huge number of text documents into different clusters, for better management of information, provides for a wide area in which a whole lot of research is currently being pursued. Recent developments in this area have tried number of different techniques. This paper reviews and discusses “Text Clustering” and partially covers all major techniques currently in use for the Process.


  • Campi, A. and Ronchi, S., "The Role of Clustering in Search Computing ," in 20th International Workshop on Databases and Expert Systems Application , Linz, Austria, pp. 432-436, 2009. DOI: 10.1109/DEXA.2009.89
  • Cutting, D. R., Karger, D. R., Pedersen, J. O., and Tukey, J. W., "Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections", in Fifteenth Annual International ACM SIGIR Conference, pp. 318-329, June 1992.
  • Hearst, M. A. and Pedersen, J. O., "Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results," in 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 74-84,1996.
  • A. K. Jain and R. C. Dubes, "Algorithms for Clustering Data", Prentice Hall, Englewood Cliffs,1988.
  • A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, Vol. 31, No. 3, pp. 264-323,1999.
  • Congnan Luo, Yanjun Li, Soon M. Chung, "Text document Clustering Based on Neighbors", Data & Knowledge Engineering, Vol: 68, No: 11, pp: 1271-1288, November 2009.
  • Xiangwei Liu, Pilian, “A Study On Text Clustering Algorithms Based On Frequent Term Sets”, Advanced Data Mining and Applications, Lecture Notes in Computer Science, 2005, Vol. 3584/2005, pp. 347-354, DOI: 10.1007/11527503_42.
  • S. Suneetha, Dr. M. Usha Rani, Yaswanth Kumar.Avulapati, "Text Clustering Based on Frequent Items Using Zoning and Ranking", International Journal of Computer Science and Information Security, Vol. 9, No. 6, pp. 208-209, June 2011
  • Yanjun Li, "High Performance Text Document Clustering" Wright State University, 2007.
  • Van Rijsbergen, C. J., "Information Retrieval", London: Butterworth Ltd., second edition.1979.
  • Benjamin C. M. Fung, Ke Wang, and Martin Ester, "Hierarchical Document Clustering", Encyclopedia of Data Warehousing and Mining, pp. 555-559, 2005, DOI: 10.4018/978-1-59140-557-3.ch105
  • G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing", Communications of the ACM, 18(11): pp. 613–620, 1975. (see also TR74-218, Cornell University, NY, USA)
  • G. Salton, J. Allan, and C. Buckley, "Automatic structuring and retrieval of large text files", Communications of the ACM, 37(2): pp. 97–108, Feb 1994.
  • G. Miller, "Wordnet: A Lexical Database for English," CACM, vol. 38, no. 11, pp.39-41, 1995.
  • Andreas Hotho, Andreas N¨urnberger, Gerhard Paaß, "A Brief Survey of Text Mining”, Journal for Computational Linguistics and Language Technology, pp. 27, 2005
  • L. Khan, "Ontology-based Information Selection," PhD Thesis, 2000.
  • L. Khan and D. McLeod, "Audio Structuring and Personalized Retrieval Using Ontology," Proceedings of IEEE Advances in Digital Libraries, 2000.
  • T. Gruber, "A Translation Approach to Portable Ontology Specifications", Knowledge Acquisition, vol. 5, no. 2, pp. 199-220, 1993.
  • Thomas R. Gruber, "Toward Principles for the Design of Ontologies Used for Knowledge Sharing", Proceedings of International Workshop on Formal Ontology, 1993.
  • Liping Jing, "Survey of Text Clustering", The University of Hong Kong, HongKong, China, pp.3-4, 2005
  • Abdelmalek Amine, Zakaria Elberrichi, and Michel Simonet, "Evaluation of Text Clustering Methods Using WordNet", International Arab Journal of Information Technology, Vol. 7, No. 4, pp. 351, October 2010
  • D. J. Hand, H. Mannila, and P. Smyth, "Principles of Data Mining", MIT Press, Cambridge, MA, USA. 2001 ISBN 0-262-08290-X.
  • Magnus Rosell, "Introduction to Text Clustering", KTH CSC, pp. 14-15, September, 2008.
  • Hammouda, K.M. and Kamel, M.S., "Efficient Phrase-Based Document Indexing for Web Document Clustering," IEEE Transaction on Knowledge and Data Engineering, vol. 16, no. 10, pp. 1279-1296, 2004.
  • Hung, C. and Xiaotie, D., "Efficient Phrase-Based Document Similarity for Clustering," IEEE Transaction on Knowledge and Data Engineering, vol. 20, no. September, pp. 1217-1229, 2008.
  • Fung, B.C.M., Wang, K., and Ester, M., "Hierarchical Document Clustering Using Frequent Itemsets,” Proceedings of SIAM International Conference on Data Mining, 2003.
  • Soon, M. C. , John, D. H., and Yanjun, L., "Text Document Clustering Based on Frequent Word Meaning Sequences," Data& Knowledge Engineering, ELSEVIER vol. 64, pp. 381-404, 2008.
  • Pepper, S., “Topic Maps,” Encyclopedia of Library and Information Sciences, Third Edition 2010
  • Muhammad Rafi, M. Shahid Shaikh, Amir Farooq, "Document Clustering Based on Topic Maps", International Journal of Computer Applications (0975 – 8887) Volume 12– No.1, pp. 33, December 2010
  • C. Fellbaum (Ed.), "WordNet: An Electronic Lexical Database", MIT Press, May, 1998.
  • Fabrizio Sebastiani, “Machine Learning in Automated Text Categorization”, ACM Computing Surveys, Vol. 34, No. 1, March 2002
  • Yanjun Li, Congnan Luo,” Text Clustering with Feature Selection by Using Statistical Data”, IEEE Transactions on Knowledge and Data Engineering, Vol. 20 No.5, May 2008
  • Manoranjan Dash ,Kiseok Choi ,Peter Scheuermann ,Huan Liu,” Feature Selection for Clustering – A Filter Solution” Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02)0-7695-1754-4/02 © 2002 IEEE
  • Tao Liu, Shengping Liu , Zheng Chen, Wei-Ying Ma,”An Evaluation on Feature Selection for Text Clustering”, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003.
  • MS. K.Mugunthadevi, MRS. S.C. Punitha, Dr..M. Punithavalli, "Survey on Feature Selection in Document Clustering" International Journal on Computer Science and Engineering, Vol. 3 No. 3, pp.1240-1241, Mar 2011
  • Nora Oikonomakou and Michalis Vazirgiannis, "A Review of Web Document Clustering Approaches", Data Mining and Knowedge Discovery Handbook, VI, pp. 921-943, 2005, DOI: 10.1007/0-387-25465-X_43
Learn about the IJCA article correction policy and process
Dealing with any form of infringement.
‘Peer Review – A Critical Inquiry’ by David Shatz
Directly place requests for print/ hard copies of IJCA via Google Docs