Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms

Mrunal V. Upasani; Rucha C. Samant

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms

by Mrunal V. Upasani, Rucha C. Samant

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 122 - Number 21

Year of Publication: 2015

Authors: Mrunal V. Upasani, Rucha C. Samant

10.5120/21848-5165

Mrunal V. Upasani, Rucha C. Samant . Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms. International Journal of Computer Applications. 122, 21 ( July 2015), 15-19. DOI=10.5120/21848-5165

@article{ 10.5120/21848-5165,

author = { Mrunal V. Upasani, Rucha C. Samant },

title = { Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms },

journal = { International Journal of Computer Applications },

issue_date = { July 2015 },

volume = { 122 },

number = { 21 },

month = { July },

year = { 2015 },

issn = { 0975-8887 },

pages = { 15-19 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume122/number21/21848-5165/ },

doi = { 10.5120/21848-5165 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:11:07.698520+05:30

%A Mrunal V. Upasani

%A Rucha C. Samant

%T Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms

%J International Journal of Computer Applications

%@ 0975-8887

%V 122

%N 21

%P 15-19

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The side information means the meta information of the documents can be used for the purpose of data mining applications like clustering, classification etc. Huge amount of meta-information is available along with the text documents in many text mining applications. Such meta-information is of different kinds, likes links in the document, user-access behavior from web logs etc. which can be useful for data mining. Tremendous amount of information can be found in this unstructured attributes for clustering purposes. Therefore, this system used an approach which carefully ascertains the coherence of the clustering characteristics of the meta information with that of the text content. For improving the quality of the clustering both the text data and meta information is helpful. In this system, the design of an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach using meta information present in document was performed. Then it shows how to extend the clustering approach to the classification problem. COATES and COLT algorithm for clustering and classification of text data along with the meta information are used and it shows the advantages of using such an approach.

References

Charu C. Aggarwal, Yuchen Zhao,Philip S. Yu, "On the Use of side Information for Mining Text Data", IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 6, June 2014.
C. C. Aggarwal ,C. X. Zhai, "Mining Text Data," New York, NY, USA: Springer, 2012.
M. Steinbach, G. Karypis, and V. Kumar, "A comparison of document clustering techniques," in Proc. Text Mining WorkshopKDD, pp. 109-110, 2000.
S. Guha, R. Rastogi, K. Shim, "CURE: An efficient clustering algorithm for large databases," in Proc. ACM SIGMOD Conf. , New York, NY, USA, pp. 73-84, 1998.
S. Guha, R. Rastogi, K. Shim, "ROCK: A robust clustering algorithm for categorical attributes," Inf. Syst. , vol. 25, no. 5, pp. 345-366, 2000
T. Zhang, R. Ramakrishnan, M. Livny, "BIRCH: An efficient data clustering method for very large databases," in Proc. ACMSIGMOD Conf. , New York, NY, USA, pp. 103-114, 1996.
H. Frigui and O. Nasraoui, "Simultaneous clustering and dynamic keyword weighting for text documents," in Survey of Text Mining, M. Berry, Ed. New York, NY, USA: Springer, pp. 45-70, 2004.
S. Zhong, "Efficient streaming text clustering,"Neural netw. , vol. 18, no. 56, pp. 790-798,2005
Cutting, D. Karger, J. Pedersen, J. Tukey, "Scatter/Gather: A cluster-based approach to browsing large document collections," in Proc. ACM SIGIR Conf. , New York, NY, USA, pp. 318-329, 1992.
Y. Sun, J. Han, J. Gao, Y. Yu," iTopicModel: Information network integrated topic modelling," in Proc. ICDM Conf. , Miami, FL, USA, pp. 493-502 2009.
C. C. Aggarwal , H. Wang, "Managing and Mining Graph Data," New York, NY, USA:Springer, 2010
C. C. Aggarwal, "Social Network Data Analytics," New York, NY, USA: Springer, 2011
C. C. Aggarwal, C. X. Zhai, "A survey of text classification algorithms," in Mining Text Data. New York, NY, USA: Springer, 2012
C. C. Aggarwal , P. S. Yu, "On text clustering with side information," in Proc. IEEE ICDE Conf. , Washington, DC,USA, 2012.

Index Terms

Computer Science

Information Sciences

Keywords

Classification clustering data mining meta information text mining