Recent Developments in Text Clustering Techniques

Saurabh Sharma; Vishal Gupta

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Recent Developments in Text Clustering Techniques

by Saurabh Sharma, Vishal Gupta

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 37 - Number 6

Year of Publication: 2012

Authors: Saurabh Sharma, Vishal Gupta

10.5120/4611-6604

Saurabh Sharma, Vishal Gupta . Recent Developments in Text Clustering Techniques. International Journal of Computer Applications. 37, 6 ( January 2012), 14-19. DOI=10.5120/4611-6604

@article{ 10.5120/4611-6604,

author = { Saurabh Sharma, Vishal Gupta },

title = { Recent Developments in Text Clustering Techniques },

journal = { International Journal of Computer Applications },

issue_date = { January 2012 },

volume = { 37 },

number = { 6 },

month = { January },

year = { 2012 },

issn = { 0975-8887 },

pages = { 14-19 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume37/number6/4611-6604/ },

doi = { 10.5120/4611-6604 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:23:36.227447+05:30

%A Saurabh Sharma

%A Vishal Gupta

%T Recent Developments in Text Clustering Techniques

%J International Journal of Computer Applications

%@ 0975-8887

%V 37

%N 6

%P 14-19

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In order to make better business decisions, faster database browsing and reducing processing time of queries, Extraction of Information from text documents in efficient manner is needed. Clustering of huge number of text documents into different clusters, for better management of information, provides for a wide area in which a whole lot of research is currently being pursued. Recent developments in this area have tried number of different techniques. This paper reviews and discusses “Text Clustering” and partially covers all major techniques currently in use for the Process.

References

Campi, A. and Ronchi, S., "The Role of Clustering in Search Computing ," in 20th International Workshop on Databases and Expert Systems Application , Linz, Austria, pp. 432-436, 2009. DOI: 10.1109/DEXA.2009.89
Cutting, D. R., Karger, D. R., Pedersen, J. O., and Tukey, J. W., "Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections", in Fifteenth Annual International ACM SIGIR Conference, pp. 318-329, June 1992.
Hearst, M. A. and Pedersen, J. O., "Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results," in 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 74-84,1996.
A. K. Jain and R. C. Dubes, "Algorithms for Clustering Data", Prentice Hall, Englewood Cliffs,1988.
A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, Vol. 31, No. 3, pp. 264-323,1999.
Congnan Luo, Yanjun Li, Soon M. Chung, "Text document Clustering Based on Neighbors", Data & Knowledge Engineering, Vol: 68, No: 11, pp: 1271-1288, November 2009.
Xiangwei Liu, Pilian, “A Study On Text Clustering Algorithms Based On Frequent Term Sets”, Advanced Data Mining and Applications, Lecture Notes in Computer Science, 2005, Vol. 3584/2005, pp. 347-354, DOI: 10.1007/11527503_42.
S. Suneetha, Dr. M. Usha Rani, Yaswanth Kumar.Avulapati, "Text Clustering Based on Frequent Items Using Zoning and Ranking", International Journal of Computer Science and Information Security, Vol. 9, No. 6, pp. 208-209, June 2011
Yanjun Li, "High Performance Text Document Clustering" Wright State University, 2007.
Van Rijsbergen, C. J., "Information Retrieval", London: Butterworth Ltd., second edition.1979.
Benjamin C. M. Fung, Ke Wang, and Martin Ester, "Hierarchical Document Clustering", Encyclopedia of Data Warehousing and Mining, pp. 555-559, 2005, DOI: 10.4018/978-1-59140-557-3.ch105
G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing", Communications of the ACM, 18(11): pp. 613–620, 1975. (see also TR74-218, Cornell University, NY, USA)
G. Salton, J. Allan, and C. Buckley, "Automatic structuring and retrieval of large text files", Communications of the ACM, 37(2): pp. 97–108, Feb 1994.
G. Miller, "Wordnet: A Lexical Database for English," CACM, vol. 38, no. 11, pp.39-41, 1995.
Andreas Hotho, Andreas N¨urnberger, Gerhard Paaß, "A Brief Survey of Text Mining”, Journal for Computational Linguistics and Language Technology, pp. 27, 2005
L. Khan, "Ontology-based Information Selection," PhD Thesis, 2000.
L. Khan and D. McLeod, "Audio Structuring and Personalized Retrieval Using Ontology," Proceedings of IEEE Advances in Digital Libraries, 2000.
T. Gruber, "A Translation Approach to Portable Ontology Specifications", Knowledge Acquisition, vol. 5, no. 2, pp. 199-220, 1993.
Thomas R. Gruber, "Toward Principles for the Design of Ontologies Used for Knowledge Sharing", Proceedings of International Workshop on Formal Ontology, 1993.
Liping Jing, "Survey of Text Clustering", The University of Hong Kong, HongKong, China, pp.3-4, 2005
Abdelmalek Amine, Zakaria Elberrichi, and Michel Simonet, "Evaluation of Text Clustering Methods Using WordNet", International Arab Journal of Information Technology, Vol. 7, No. 4, pp. 351, October 2010
D. J. Hand, H. Mannila, and P. Smyth, "Principles of Data Mining", MIT Press, Cambridge, MA, USA. 2001 ISBN 0-262-08290-X.
Magnus Rosell, "Introduction to Text Clustering", KTH CSC, pp. 14-15, September, 2008.
Hammouda, K.M. and Kamel, M.S., "Efficient Phrase-Based Document Indexing for Web Document Clustering," IEEE Transaction on Knowledge and Data Engineering, vol. 16, no. 10, pp. 1279-1296, 2004.
Hung, C. and Xiaotie, D., "Efficient Phrase-Based Document Similarity for Clustering," IEEE Transaction on Knowledge and Data Engineering, vol. 20, no. September, pp. 1217-1229, 2008.
Fung, B.C.M., Wang, K., and Ester, M., "Hierarchical Document Clustering Using Frequent Itemsets,” Proceedings of SIAM International Conference on Data Mining, 2003.
Soon, M. C. , John, D. H., and Yanjun, L., "Text Document Clustering Based on Frequent Word Meaning Sequences," Data& Knowledge Engineering, ELSEVIER vol. 64, pp. 381-404, 2008.
Pepper, S., “Topic Maps,” Encyclopedia of Library and Information Sciences, Third Edition 2010
Muhammad Rafi, M. Shahid Shaikh, Amir Farooq, "Document Clustering Based on Topic Maps", International Journal of Computer Applications (0975 – 8887) Volume 12– No.1, pp. 33, December 2010
C. Fellbaum (Ed.), "WordNet: An Electronic Lexical Database", MIT Press, May, 1998.
Fabrizio Sebastiani, “Machine Learning in Automated Text Categorization”, ACM Computing Surveys, Vol. 34, No. 1, March 2002
Yanjun Li, Congnan Luo,” Text Clustering with Feature Selection by Using Statistical Data”, IEEE Transactions on Knowledge and Data Engineering, Vol. 20 No.5, May 2008
Manoranjan Dash ,Kiseok Choi ,Peter Scheuermann ,Huan Liu,” Feature Selection for Clustering – A Filter Solution” Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02)0-7695-1754-4/02 © 2002 IEEE
Tao Liu, Shengping Liu , Zheng Chen, Wei-Ying Ma,”An Evaluation on Feature Selection for Text Clustering”, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003.
MS. K.Mugunthadevi, MRS. S.C. Punitha, Dr..M. Punithavalli, "Survey on Feature Selection in Document Clustering" International Journal on Computer Science and Engineering, Vol. 3 No. 3, pp.1240-1241, Mar 2011
Nora Oikonomakou and Michalis Vazirgiannis, "A Review of Web Document Clustering Approaches", Data Mining and Knowedge Discovery Handbook, VI, pp. 921-943, 2005, DOI: 10.1007/0-387-25465-X_43

Index Terms

Computer Science

Information Sciences

Keywords

Text clustering K-mean clustering hierarchical clustering topic tracing feature selection ontology WORDNET frequent word sequence.