K-Means Clustering Algorithm based on Entity Resolution

B. Vinay Kumar; B. Raghu Ram; B. Hanmanthu

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

Computation (Abacus) Aspects of the Sahasralingam

Jun

2016

Design and Implementation of Photo Voltaic System: Arduino Approach

August

2013

A Review of the Effective Techniques of Compression in Medical Image Processing

July

2014

Performance Comparisons of Novel Feature Vector Selection Methods for Iris Recognition

July

2012

Reseach Article

K-Means Clustering Algorithm based on Entity Resolution

by B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 108 - Number 6

Year of Publication: 2014

Authors: B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu

10.5120/18919-0254

B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu . K-Means Clustering Algorithm based on Entity Resolution. International Journal of Computer Applications. 108, 6 ( December 2014), 41-44. DOI=10.5120/18919-0254

@article{ 10.5120/18919-0254,

author = { B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu },

title = { K-Means Clustering Algorithm based on Entity Resolution },

journal = { International Journal of Computer Applications },

issue_date = { December 2014 },

volume = { 108 },

number = { 6 },

month = { December },

year = { 2014 },

issn = { 0975-8887 },

pages = { 41-44 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume108/number6/18919-0254/ },

doi = { 10.5120/18919-0254 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:42:19.074004+05:30

%A B. Vinay Kumar

%A B. Raghu Ram

%A B. Hanmanthu

%T K-Means Clustering Algorithm based on Entity Resolution

%J International Journal of Computer Applications

%@ 0975-8887

%V 108

%N 6

%P 41-44

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Entity resolution is the problem of recognizing which entry in database refers to same cluster. in this we have to run the ER in order to reduce the running time and to obtain good results. This paper investigates how we can reduce the running of ER with minimum amount of work using k-means clustering algorithm. In this, clustering can be done according to the matching of entries. We introduce a concept of technique called as k-means clustering to maximize the matching of entries identified using a limited amount of work. We illustrate the potential gains of this entity resolution approach using k-means.

References

A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, "Duplicate Record Detection: A Survey," IEEE Trans. Knowledge Data Eng. , vol. 19, no. 1, pp. 1-16, Jan. 2007.
A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999
H. B. Newcombe and J. M. Kennedy, "Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information," Comm. ACM, vol. 5, no. 11 pp. 563-566, 1962.
M. A. Herna´ndez and S. J. Stolfo, "The Merge/Purge Problem for Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 127-138, 1995.
A. K. McCallum, K. Nigam, and L. Ungar, "Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching," Proc. ACM Sixth SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 169-178, 2000.
Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. 25th Int'l Conf. Very Large Databases (VLDB), pp. 518-529, 1999.
X. Dong, A. Y. Halevy, and J. Madhavan, "Reference Reconciliation in Complex Information Spaces," Proc. ACM SIGMOD Int'lConf. Management of Data, pp. 85-96, 2005.
M. Weis and F. Naumann, "Detecting Duplicates in ComplexXML Data," Proc. 22nd Int'l Conf. Data Eng. (ICDE),p. 109. 2006.

Index Terms

Computer Science

Information Sciences

Keywords

Data cleaning Entity resolution-means Clustering Algorithm