Call for Paper - November 2023 Edition
IJCA solicits original research papers for the November 2023 Edition. Last date of manuscript submission is October 20, 2023. Read More

Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms

Print
PDF
International Journal of Computer Applications
© 2015 by IJCA Journal
Volume 122 - Number 21
Year of Publication: 2015
Authors:
Mrunal V. Upasani
Rucha C. Samant
10.5120/21848-5165

Mrunal V Upasani and Rucha C Samant. Article: Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms. International Journal of Computer Applications 122(21):15-19, July 2015. Full text available. BibTeX

@article{key:article,
	author = {Mrunal V. Upasani and Rucha C. Samant},
	title = {Article: Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms},
	journal = {International Journal of Computer Applications},
	year = {2015},
	volume = {122},
	number = {21},
	pages = {15-19},
	month = {July},
	note = {Full text available}
}

Abstract

The side information means the meta information of the documents can be used for the purpose of data mining applications like clustering, classification etc. Huge amount of meta-information is available along with the text documents in many text mining applications. Such meta-information is of different kinds, likes links in the document, user-access behavior from web logs etc. which can be useful for data mining. Tremendous amount of information can be found in this unstructured attributes for clustering purposes. Therefore, this system used an approach which carefully ascertains the coherence of the clustering characteristics of the meta information with that of the text content. For improving the quality of the clustering both the text data and meta information is helpful. In this system, the design of an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach using meta information present in document was performed. Then it shows how to extend the clustering approach to the classification problem. COATES and COLT algorithm for clustering and classification of text data along with the meta information are used and it shows the advantages of using such an approach.

References

  • Charu C. Aggarwal, Yuchen Zhao,Philip S. Yu, "On the Use of side Information for Mining Text Data", IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 6, June 2014.
  • C. C. Aggarwal ,C. X. Zhai, "Mining Text Data," New York, NY, USA: Springer, 2012.
  • M. Steinbach, G. Karypis, and V. Kumar, "A comparison of document clustering techniques," in Proc. Text Mining WorkshopKDD, pp. 109-110, 2000.
  • S. Guha, R. Rastogi, K. Shim, "CURE: An efficient clustering algorithm for large databases," in Proc. ACM SIGMOD Conf. , New York, NY, USA, pp. 73-84, 1998.
  • S. Guha, R. Rastogi, K. Shim, "ROCK: A robust clustering algorithm for categorical attributes," Inf. Syst. , vol. 25, no. 5, pp. 345-366, 2000
  • T. Zhang, R. Ramakrishnan, M. Livny, "BIRCH: An efficient data clustering method for very large databases," in Proc. ACMSIGMOD Conf. , New York, NY, USA, pp. 103-114, 1996.
  • H. Frigui and O. Nasraoui, "Simultaneous clustering and dynamic keyword weighting for text documents," in Survey of Text Mining, M. Berry, Ed. New York, NY, USA: Springer, pp. 45-70, 2004.
  • S. Zhong, "Efficient streaming text clustering,"Neural netw. , vol. 18, no. 56, pp. 790-798,2005
  • Cutting, D. Karger, J. Pedersen, J. Tukey, "Scatter/Gather: A cluster-based approach to browsing large document collections," in Proc. ACM SIGIR Conf. , New York, NY, USA, pp. 318-329, 1992.
  • Y. Sun, J. Han, J. Gao, Y. Yu," iTopicModel: Information network integrated topic modelling," in Proc. ICDM Conf. , Miami, FL, USA, pp. 493-502 2009.
  • C. C. Aggarwal , H. Wang, "Managing and Mining Graph Data," New York, NY, USA:Springer, 2010
  • C. C. Aggarwal, "Social Network Data Analytics," New York, NY, USA: Springer, 2011
  • C. C. Aggarwal, C. X. Zhai, "A survey of text classification algorithms," in Mining Text Data. New York, NY, USA: Springer, 2012
  • C. C. Aggarwal , P. S. Yu, "On text clustering with side information," in Proc. IEEE ICDE Conf. , Washington, DC,USA, 2012.