Web Document Clustering using Proposed Similarity Measure

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Solution- Architecture in ASP.Net Core

Mar

2020

Multiview Smile Detection by Gabor Wavelet Decision Tree Classifier

June

2013

A XTC based Authentication Scheme for MANET

Aug

2017

Establishing Consensus in Knowledge Base Creation of Medicinal Plants of African Traditional Medicine

May

2021

Reseach Article

Web Document Clustering using Proposed Similarity Measure

Published on December 2014 by P. H. Govardhan, K. P. Wagh, P. N. Chatur

National Conference on Emerging Trends in Computer Technology

Foundation of Computer Science USA

NCETCT - Number 2

December 2014

Authors: P. H. Govardhan, K. P. Wagh, P. N. Chatur

P. H. Govardhan, K. P. Wagh, P. N. Chatur . Web Document Clustering using Proposed Similarity Measure. National Conference on Emerging Trends in Computer Technology. NCETCT, 2 (December 2014), 15-18.

@article{

author = { P. H. Govardhan, K. P. Wagh, P. N. Chatur },

title = { Web Document Clustering using Proposed Similarity Measure },

journal = { National Conference on Emerging Trends in Computer Technology },

issue_date = { December 2014 },

volume = { NCETCT },

number = { 2 },

month = { December },

year = { 2014 },

issn = 0975-8887,

pages = { 15-18 },

numpages = 4,

url = { /proceedings/ncetct/number2/19088-4022/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 National Conference on Emerging Trends in Computer Technology

%A P. H. Govardhan

%A K. P. Wagh

%A P. N. Chatur

%T Web Document Clustering using Proposed Similarity Measure

%J National Conference on Emerging Trends in Computer Technology

%@ 0975-8887

%V NCETCT

%N 2

%P 15-18

%D 2014

%I International Journal of Computer Applications

Abstract

Recent advance research in data warehousing and data mining emerges various types of information sources. Web documents are the most useful information resources in this era. Efficient uses of these resources are most important for knowledge discovery. Bunch of documents providing related information is to be grouped in one cluster. Finding the similarity between documents is tedious task. There are various similarity measures introduced earlier to solve the problems related to clustering. Proposing new similarity measure to get better results of clustering is reason behind this paper work. As before concern to previous research, there is no consideration of present and absent features in documents. Proposed similarity measure concentrates on both present and absent features in the documents. Concentrating on similarity measure will help to mining technique.

References

Yung-Shen Lin, Jung-Yi Jiang and Shie-Jue Lee," A Similarity Measure for Text Classification and Clustering", IEEE Transactions On Knowledge And Data Engineering, 2013.
Gaddam Saidi Reddy and Dr. R. V. Krishnaiah," Clustering Algorithm with a Novel Similarity Measure", IOSR Journal of Computer Engineering (IOSRJCE),Vol. 4, No. 6, pp. 37-42, Sep-Oct. 2012.
Shady Shehata, Fakhri Karray, and Mohamed S. Kamel, "An Efficient Concept-Based Mining Model for Enhancing Text Clustering", IEEE Transactions On Knowledge And Data Engineering, Vol. 22, No. 10, October 2010.
Anna Huang, Department of Computer Science, The University of Waikato, Hamilton, New Zealand," Similarity Measures for Text Document Clustering", New Zealand Computer Science Research Student Conference (NZCSRSC), Christchurch, New Zealand, April 2008.
H. Chim and X. Deng, "Efficient phrase-based document similarity for clustering", IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No. 9, pp. 1217 – 1229, 2008.
Yanhong Zhai and Bing Liu, "Web Data Extraction Based on Partial Tree Alignment", International World Wide Web Conference Committee (IW3C2), ACM 1-59593-046, 9/05/2005.
J. Kogan, M. Teboulle and C. K. Nicholas, "Data driven similarity measures for k-means like clustering algorithms", Information Retrieval, Vol. 8, No. 2, pp. 331–349, 2005.
S. Dhillon, J. Kogan and C. Nicholas, " Feature Selection and Document Clustering", In Berry MW Ed. A Comprehensive Survey of Text Mining, 2003.
Syed Masum Emran and Nong Ye, "Robustness of Canberra Metric in ComputerIntrusion Detection", IEEE Workshop onInformation Assurance and Security United States Military Academy, West Point, NY, 5-6 June, 2001.
Alexander Strehl, Joydeep Ghosh, and Raymond Mooney,"Impact of Similarity Measures on Web-page Clustering", Workshop of Artificial Intelligence for Web Search, July 2000.

Index Terms

Computer Science

Information Sciences

Keywords

Cluster Document Vector Inverse Document Frequency Similarity Measure Term Frequency Web Document.