A Naive Clustering Algorithm for Text Mining

Aishwarya Kappala; Sudhakar Godi

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Comparison of Preprocessing Algorithms using an Affordable EEG Headset

Feb

2017

Impact of Mobility on Energy Consumption of AODV Protocol for Routing in Mobile Ad Hoc Networks

Oct

2016

Performance Evaluation and Comparison of PDTMRP and MAODV

May

2015

Development of Kannada Speech Corpus for Continuous Speech Recognition

Jun

2018

Reseach Article

A Naive Clustering Algorithm for Text Mining

by Aishwarya Kappala, Sudhakar Godi

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 127 - Number 17

Year of Publication: 2015

Authors: Aishwarya Kappala, Sudhakar Godi

10.5120/ijca2015906717

Aishwarya Kappala, Sudhakar Godi . A Naive Clustering Algorithm for Text Mining. International Journal of Computer Applications. 127, 17 ( October 2015), 20-24. DOI=10.5120/ijca2015906717

@article{ 10.5120/ijca2015906717,

author = { Aishwarya Kappala, Sudhakar Godi },

title = { A Naive Clustering Algorithm for Text Mining },

journal = { International Journal of Computer Applications },

issue_date = { October 2015 },

volume = { 127 },

number = { 17 },

month = { October },

year = { 2015 },

issn = { 0975-8887 },

pages = { 20-24 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume127/number17/22821-2015906717/ },

doi = { 10.5120/ijca2015906717 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:18:17.112960+05:30

%A Aishwarya Kappala

%A Sudhakar Godi

%T A Naive Clustering Algorithm for Text Mining

%J International Journal of Computer Applications

%@ 0975-8887

%V 127

%N 17

%P 20-24

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Predefined categories can be assigned to the natural language text using for text classification. It is a “bag-of-word” representation, previous documents have a word with values, it represents how frequently this word appears in the document or not. But large documents may face many problems because they have irrelevant or abundant information is there. This paper explores the effect of other types of values, which express the distribution of a word in the document. These values are called distributional features. All features are calculated by tfidf style equation and these features are combined with machine learning techniques. Term frequency is one of the major factor for distributional features it holds weighted item set. When the need is to minimize a certain score function, discovering rare data correlations is more interesting than mining frequent ones. This paper tackles the issue of discovering rare and weighted item sets, i.e., the infrequent weighted item set mining problem. The classifier which gives the more accurate result is selected for categorization. Experiments show that the distributional features are useful for text categorization.

References

R. Bekkerman, R. Elaine, N.Tishby, and Y.Winter, “Distributional Word Clusters versus Words for Text Categorization,”J. Machine Learning Research, vol. 3, pp. 1182-1208, 2003
G. Narasimha Rao, R. Ramesh, D. Rajesh, D. Chandra sekhar."An Automated Advanced Clustering Algorithm For Text Classification". In International Journal of Computer Science and Technology, vol 3,issue 2-4, June, 2012, eISSN : 0976 - 8491,pISSN : 2229 – 4333.
D.CAI, S.P. Yu, J.R. Wen, and WY. Ma, “VIPS: A Vision-Based Page Segmentation Algorithm” Technical Report MSR-TR-2003-79, Microsoft Seattle, Washington, 2003.
J.P. Calan, “Passage Retrieval Evidence in Document Retrieval,”Proc. ACM SIGIR ’94, pp. 30310, 1994.
Rao, Gudikandhula Narasimha, and P. Jagdeeswar Rao. "A Clustering Analysis for Heart Failure Alert System Using RFID and GPS." ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India-Vol I. Springer International Publishing, 2014.
M. Craven, D. DiPasquo, D. Freitag, A. K. McCallum, T. M. Mitchell, K. Nigam, and S. Slattery, “Learning to extract symbolic knowledge from the world wide web,” in Proceedings of the 15th National Conference for Artificial Intelligence, Madison, WI, 1998, pp. 509–516.
F. Debole and F. Sebastiani, “Supervised term weighting for automated text categorization,” in Proceedings of the 18th ACM Symposium on Applied Computing, Melbourne, FL, 2003, pp. 784–788.
T. G. Dietterich, “Machine learning research: Four current directions,”
D. Lewis, “Reuters-21578 text categorization test colleciton, dist. 1.0,” 1997. AI Magazine, vol. 18, no. 4, pp. 97–136, 1997.
Y. Yang, “An evaluation of statistical approaches to text categorization,” in Inf. Retreival, vol. 1, pp. 69–90, 1999.
S. Shankar and G. Karypis, “A Feature Weight Adjustment Algorithm for Document Classification,” Proc. SIGKDD ’00 Workshop Text Mining, 2000.
K. Sun and F. Bai, “Mining Weighted Association Rules Without Preassigned Weights,” IEEE Trans. Knowledge and Data Eng., vol. 20, no. 4, pp. 489-495, Apr. 2008.
S. Zhu, X. Ji, W. Xu, and Y. Gong, “Multi-labelled classification using maximum entropy method,” in Proc. Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2005, pp. 1041–1048
J.P. Calan, “Passage Retrieval Evidence in Document Retrieval,”Proc. ACM SIGIR ’94, pp. 30310, 1994.
X. Ling, Q. Mei, C. Zhai, and B. Schatz, “Mining multi-faceted overviews of arbitrary topics in a text collection,” in Proc. 14th ACM SIGKDD Knowl. Discovery Data Mining, 2008, pp. 497–505.
I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” in J. Mach. Learn. Res., vol. 3, no. 1, pp. 1157–1182, 2003.
T. Joachims, “Transductive inference for text classification using support vector machines,” in Proc. Annu. Int. Conf. Mach. Learn., 1999, pp. 200–209.
X.-L. Li, B. Liu, and S.-K. Ng, “Learning to classify documents with only a small positive training set,” in Proc. 18th Eur. Conf. Mach. Learn., 2007, pp. 201–213.
Y. Li, A. Algarni, S.-T. Wu, and Y. Xue, “Mining negative relevance feedback for information filtering,” in Proc. Web Intell. Intell. Agent Technol., 2009, pp. 606–613.
S.-T. Wu, Y. Li, and Y. Xu, “Deploying approaches for pattern refinement in text mining,” in Proc. IEEE Conf. Data Mining, 2006, pp. 1157–1161.

Index Terms

Computer Science

Information Sciences

Keywords

Text Classification Text Mining Machine Learning Compactness tfidi Weighted database.