CFP last date
20 May 2024
Reseach Article

Effective K-Means Document Clustering using Dictionary Defined Lexical Analyzer (DDLA)

by R. Ranga Raj, M. Punithavalli
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 59 - Number 20
Year of Publication: 2012
Authors: R. Ranga Raj, M. Punithavalli
10.5120/9816-4363

R. Ranga Raj, M. Punithavalli . Effective K-Means Document Clustering using Dictionary Defined Lexical Analyzer (DDLA). International Journal of Computer Applications. 59, 20 ( December 2012), 4-8. DOI=10.5120/9816-4363

@article{ 10.5120/9816-4363,
author = { R. Ranga Raj, M. Punithavalli },
title = { Effective K-Means Document Clustering using Dictionary Defined Lexical Analyzer (DDLA) },
journal = { International Journal of Computer Applications },
issue_date = { December 2012 },
volume = { 59 },
number = { 20 },
month = { December },
year = { 2012 },
issn = { 0975-8887 },
pages = { 4-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume59/number20/9816-4363/ },
doi = { 10.5120/9816-4363 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:04:45.874403+05:30
%A R. Ranga Raj
%A M. Punithavalli
%T Effective K-Means Document Clustering using Dictionary Defined Lexical Analyzer (DDLA)
%J International Journal of Computer Applications
%@ 0975-8887
%V 59
%N 20
%P 4-8
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Due to tremendous increase in number of documents, clustering of such document is difficult one. Document Clustering is the process of grouping related documents from the large collection of database. The mining of such related documents from the database which are unlabelled is a challenging one. To overcome this process, clustering is used to filter the unlabelled documents from the large collection of database. In this paper, a new concept is introduced for the document clustering by using k-means Enhanced Approach algorithm [1] with the Dictionary Defined Lexical Analyzer (DDLA). Basically K-Mean algorithm clusters the numeric values efficiently. But with the inclusion of DDLA the characters, words and sentences can also be clustered. Based on the weights, documents are clustered [7] by using bisecting k-means algorithm [1, 2] and topic detection method. The discovery of meaningful labels for the document is based on semantic similarity [8]. The efficient clustering of unlabeled documents with enhanced K-Mean algorithm and DDLA is one of the techniques which make clustering in an easiest way.

References
  1. "Improving the accuracy and efficiency of K-Mean Clustering Algorithm", by K. A. Abdul Nazeer, M. P. Sebastian. Proceeding of the world congress on Engineering 2009 vol I WCE 2009, July 1-3, 2009, London, U. K.
  2. Korean Text Extraction by "Local Color Quantization and K-means Clustering" In Natural Scene Anh-Nga Lai*, KeonHee Park, Manoj Kumar, GueeSang Lee*Department of Computer Science, Chonnam National University, 500-757 Gwangju, Korea ltanhnga@gmail. com, gslee@chonnam. ac. kr
  3. "Cluster Analysis for Gene Expression Data," Daxin Jiang, Chum Tong and Aidong Zhang, IEEE Transactions on Data and Knowledge Engineering, 16(11): 1370-1386, 2004.
  4. "Fast Document Clustering Based on Weighted Comparative Advantage"Jie Ji Intelligent System Lab The University of Aizu Aizuwakamatsu, Fukushima, Japan d8102102@u-aizu. ac. jp
  5. "A Comparison of Document Clustering Techniques",Michael Steinbach,George Karypis. Department of Computer Science University of Minnesota Technical Report #00-034 steinbac, karypis, kumar@cs. umn. edu Vipin Kumar
  6. "Clustering Of Image Data Set Using K-Means and Fuzzy K-Means Algorithms" Vinod Kumar Dehariya I. T dept. S. A. T. I Vidisha (M. P), India Vidisha (M. P), India Vidisha (M. P), Indiavkdworld@yahoo. com.
  7. "Document Clustering in Correlation Similarity Measure Space" Taiping Zhang; Yuan Yan Tang; Bin Fang; Yong Xiang Knowledge and Data Engineering, IEEE Transactions on Volume: 24 ,,2012
  8. "A Web Search Engine-Based Approach to Measure Semantic Similarity between Words" Bollegala, D. ; Matsuo, Y. ; Ishizuka, M. Knowledge and Data Engineering, IEEE Transactions on Volume: 23 ,,2011
  9. "Spoken Document Retrieval With Unsupervised Query Modeling Techniques Chen", B. ; Kuan-Yu Chen; Pei-Ning Chen; Yi-Wen Chen Audio, Speech, and Language Processing, IEEE Transactions on Volume: 20 , Issue: 9 ,2012
  10. "Data Extraction for Deep Web Using WordNet Jer Lang Hong Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on Volume: 41 , Issue: 6 ,2011
  11. "Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination" Muscariello, A. ; Gravier, G. ; Bimbot, F. Audio, Speech, and Language Processing, IEEE Transactions on Volume: 20 , Issue: 7,2012
  12. Automatic Discovery of Personal Name Aliases from the Web Bollegala, D. ; Matsuo, Y. ; Ishizuka, M. Knowledge and Data Engineering, IEEE Transactions on Volume: 23 , Issue: 6 ,2011
Index Terms

Computer Science
Information Sciences

Keywords

Clustering K-Means Enhanced Approach Algorithm Lexical Analyzer Defined Dictionary DDLA.