CFP last date
22 April 2024
Reseach Article

Document Clustering in Distributed Environment

Published on February 2013 by R.brintha, S. Bhuvaneswari
National Conference on Future Computing 2013
Foundation of Computer Science USA
NCFC - Number 1
February 2013
Authors: R.brintha, S. Bhuvaneswari
de569446-b158-4eeb-8c97-a69e46f91811

R.brintha, S. Bhuvaneswari . Document Clustering in Distributed Environment. National Conference on Future Computing 2013. NCFC, 1 (February 2013), 30-33.

@article{
author = { R.brintha, S. Bhuvaneswari },
title = { Document Clustering in Distributed Environment },
journal = { National Conference on Future Computing 2013 },
issue_date = { February 2013 },
volume = { NCFC },
number = { 1 },
month = { February },
year = { 2013 },
issn = 0975-8887,
pages = { 30-33 },
numpages = 4,
url = { /proceedings/ncfc/number1/10406-1008/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Future Computing 2013
%A R.brintha
%A S. Bhuvaneswari
%T Document Clustering in Distributed Environment
%J National Conference on Future Computing 2013
%@ 0975-8887
%V NCFC
%N 1
%P 30-33
%D 2013
%I International Journal of Computer Applications
Abstract

Document clustering has emerged as a widely used technique with the increase in large number of documents that is getting accumulated day by day in various fields like news groups, government organizations, Internet and digital libraries. Document clustering is the process of grouping similar documents into clusters . A good document clustering algorithm should have high intra-cluster similarity and less inter- cluster similarity. i. e the documents with the clusters should be more relevant compared to the documents of other clusters. In this paper, the implementation of document clustering in distributed environment based on peer to peer network architecture is reviewed. The documents in local site are clustered using K-means algorithm. Hierarchical clustering is obtained when clusters in each peer combine to form the next level of cluster. This process repeats until a global cluster is formed and is made available in all the peers. These clustered documents find its application in search engines.

References
  1. Yi Peng, Gang Kou, Yong Shi, Zhengxin chen , " A Hybrid Strategy for Clustering Data Mining Documents," IEEE international conference on data mining-workshops,2006
  2. Khaled M. Hammouda and Mohamed S. kamel, "Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization," IEEE transactions on knowledge and data engineering, vol. 21 , no. 5, May 2009
  3. N. F. Samatova, G. Ostrouchov, A. Geist, and A. V. MelechkoRACHET: "An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets," Distributed and Parallel Databases, vol. 11, no. 2, pp. 157-180, 2002.
  4. M. F. Porter, "An Algorithm for Suffix Stripping," Program, vol. 14, no. 3, pp. 130-137, July 1980.
  5. Jiawei Han and Micheline Kamber, "Data Mining Concepts and techniques", Second Edition
  6. Hinrich Schiitze, Craig Silverstein "Projections For Efficient Document Clustering", Xerox Palo Alto Research Centre
  7. Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey "Scatter/ Gather: A Cluster based Approach to Browsing Large Document Collections".
  8. Michael Steinbach, George Karypis and Vipin Kumar, "A comparison of Document Clustering Techniques", University of Minnesota.
  9. Debzani Deb, M. Muztaba Faud and Rafal A. Angryk,"Distributed Hierarchical Document Clustering", Motana State University, Bozeman,MT 59717, USA.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Distributed Knowledge Discovery K-means Algoriothm Supernodes Intercluster Intracluster