Document Clustering in Distributed Environment

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Document Clustering in Distributed Environment

Published on February 2013 by R.brintha, S. Bhuvaneswari

National Conference on Future Computing 2013

Foundation of Computer Science USA

NCFC - Number 1

February 2013

Authors: R.brintha, S. Bhuvaneswari

R.brintha, S. Bhuvaneswari . Document Clustering in Distributed Environment. National Conference on Future Computing 2013. NCFC, 1 (February 2013), 30-33.

@article{

author = { R.brintha, S. Bhuvaneswari },

title = { Document Clustering in Distributed Environment },

journal = { National Conference on Future Computing 2013 },

issue_date = { February 2013 },

volume = { NCFC },

number = { 1 },

month = { February },

year = { 2013 },

issn = 0975-8887,

pages = { 30-33 },

numpages = 4,

url = { /proceedings/ncfc/number1/10406-1008/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 National Conference on Future Computing 2013

%A R.brintha

%A S. Bhuvaneswari

%T Document Clustering in Distributed Environment

%J National Conference on Future Computing 2013

%@ 0975-8887

%V NCFC

%N 1

%P 30-33

%D 2013

%I International Journal of Computer Applications

Abstract

Document clustering has emerged as a widely used technique with the increase in large number of documents that is getting accumulated day by day in various fields like news groups, government organizations, Internet and digital libraries. Document clustering is the process of grouping similar documents into clusters . A good document clustering algorithm should have high intra-cluster similarity and less inter- cluster similarity. i. e the documents with the clusters should be more relevant compared to the documents of other clusters. In this paper, the implementation of document clustering in distributed environment based on peer to peer network architecture is reviewed. The documents in local site are clustered using K-means algorithm. Hierarchical clustering is obtained when clusters in each peer combine to form the next level of cluster. This process repeats until a global cluster is formed and is made available in all the peers. These clustered documents find its application in search engines.

References

Yi Peng, Gang Kou, Yong Shi, Zhengxin chen , " A Hybrid Strategy for Clustering Data Mining Documents," IEEE international conference on data mining-workshops,2006
Khaled M. Hammouda and Mohamed S. kamel, "Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization," IEEE transactions on knowledge and data engineering, vol. 21 , no. 5, May 2009
N. F. Samatova, G. Ostrouchov, A. Geist, and A. V. MelechkoRACHET: "An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets," Distributed and Parallel Databases, vol. 11, no. 2, pp. 157-180, 2002.
M. F. Porter, "An Algorithm for Suffix Stripping," Program, vol. 14, no. 3, pp. 130-137, July 1980.
Jiawei Han and Micheline Kamber, "Data Mining Concepts and techniques", Second Edition
Hinrich Schiitze, Craig Silverstein "Projections For Efficient Document Clustering", Xerox Palo Alto Research Centre
Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey "Scatter/ Gather: A Cluster based Approach to Browsing Large Document Collections".
Michael Steinbach, George Karypis and Vipin Kumar, "A comparison of Document Clustering Techniques", University of Minnesota.
Debzani Deb, M. Muztaba Faud and Rafal A. Angryk,"Distributed Hierarchical Document Clustering", Motana State University, Bozeman,MT 59717, USA.

Index Terms

Computer Science

Information Sciences

Keywords

Clustering Distributed Knowledge Discovery K-means Algoriothm Supernodes Intercluster Intracluster