CFP last date
21 October 2024
Reseach Article

Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms

by Mrunal V. Upasani, Rucha C. Samant
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 122 - Number 21
Year of Publication: 2015
Authors: Mrunal V. Upasani, Rucha C. Samant
10.5120/21848-5165

Mrunal V. Upasani, Rucha C. Samant . Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms. International Journal of Computer Applications. 122, 21 ( July 2015), 15-19. DOI=10.5120/21848-5165

@article{ 10.5120/21848-5165,
author = { Mrunal V. Upasani, Rucha C. Samant },
title = { Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms },
journal = { International Journal of Computer Applications },
issue_date = { July 2015 },
volume = { 122 },
number = { 21 },
month = { July },
year = { 2015 },
issn = { 0975-8887 },
pages = { 15-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume122/number21/21848-5165/ },
doi = { 10.5120/21848-5165 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:11:07.698520+05:30
%A Mrunal V. Upasani
%A Rucha C. Samant
%T Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms
%J International Journal of Computer Applications
%@ 0975-8887
%V 122
%N 21
%P 15-19
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The side information means the meta information of the documents can be used for the purpose of data mining applications like clustering, classification etc. Huge amount of meta-information is available along with the text documents in many text mining applications. Such meta-information is of different kinds, likes links in the document, user-access behavior from web logs etc. which can be useful for data mining. Tremendous amount of information can be found in this unstructured attributes for clustering purposes. Therefore, this system used an approach which carefully ascertains the coherence of the clustering characteristics of the meta information with that of the text content. For improving the quality of the clustering both the text data and meta information is helpful. In this system, the design of an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach using meta information present in document was performed. Then it shows how to extend the clustering approach to the classification problem. COATES and COLT algorithm for clustering and classification of text data along with the meta information are used and it shows the advantages of using such an approach.

References
  1. Charu C. Aggarwal, Yuchen Zhao,Philip S. Yu, "On the Use of side Information for Mining Text Data", IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 6, June 2014.
  2. C. C. Aggarwal ,C. X. Zhai, "Mining Text Data," New York, NY, USA: Springer, 2012.
  3. M. Steinbach, G. Karypis, and V. Kumar, "A comparison of document clustering techniques," in Proc. Text Mining WorkshopKDD, pp. 109-110, 2000.
  4. S. Guha, R. Rastogi, K. Shim, "CURE: An efficient clustering algorithm for large databases," in Proc. ACM SIGMOD Conf. , New York, NY, USA, pp. 73-84, 1998.
  5. S. Guha, R. Rastogi, K. Shim, "ROCK: A robust clustering algorithm for categorical attributes," Inf. Syst. , vol. 25, no. 5, pp. 345-366, 2000
  6. T. Zhang, R. Ramakrishnan, M. Livny, "BIRCH: An efficient data clustering method for very large databases," in Proc. ACMSIGMOD Conf. , New York, NY, USA, pp. 103-114, 1996.
  7. H. Frigui and O. Nasraoui, "Simultaneous clustering and dynamic keyword weighting for text documents," in Survey of Text Mining, M. Berry, Ed. New York, NY, USA: Springer, pp. 45-70, 2004.
  8. S. Zhong, "Efficient streaming text clustering,"Neural netw. , vol. 18, no. 56, pp. 790-798,2005
  9. Cutting, D. Karger, J. Pedersen, J. Tukey, "Scatter/Gather: A cluster-based approach to browsing large document collections," in Proc. ACM SIGIR Conf. , New York, NY, USA, pp. 318-329, 1992.
  10. Y. Sun, J. Han, J. Gao, Y. Yu," iTopicModel: Information network integrated topic modelling," in Proc. ICDM Conf. , Miami, FL, USA, pp. 493-502 2009.
  11. C. C. Aggarwal , H. Wang, "Managing and Mining Graph Data," New York, NY, USA:Springer, 2010
  12. C. C. Aggarwal, "Social Network Data Analytics," New York, NY, USA: Springer, 2011
  13. C. C. Aggarwal, C. X. Zhai, "A survey of text classification algorithms," in Mining Text Data. New York, NY, USA: Springer, 2012
  14. C. C. Aggarwal , P. S. Yu, "On text clustering with side information," in Proc. IEEE ICDE Conf. , Washington, DC,USA, 2012.
Index Terms

Computer Science
Information Sciences

Keywords

Classification clustering data mining meta information text mining