CFP last date
20 May 2024
Reseach Article

Improved One-to-Many Record Linkage using One-Class Clustering Tree

Published on May 2014 by Sunandhini, S Suguna, M Sharmila. D
International Conference on Simulations in Computing Nexus
Foundation of Computer Science USA
ICSCN - Number 2
May 2014
Authors: Sunandhini, S Suguna, M Sharmila. D
2d6ae751-24af-4f49-87dd-5034d9df90be

Sunandhini, S Suguna, M Sharmila. D . Improved One-to-Many Record Linkage using One-Class Clustering Tree. International Conference on Simulations in Computing Nexus. ICSCN, 2 (May 2014), 23-26.

@article{
author = { Sunandhini, S Suguna, M Sharmila. D },
title = { Improved One-to-Many Record Linkage using One-Class Clustering Tree },
journal = { International Conference on Simulations in Computing Nexus },
issue_date = { May 2014 },
volume = { ICSCN },
number = { 2 },
month = { May },
year = { 2014 },
issn = 0975-8887,
pages = { 23-26 },
numpages = 4,
url = { /proceedings/icscn/number2/16155-1022/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Simulations in Computing Nexus
%A Sunandhini
%A S Suguna
%A M Sharmila. D
%T Improved One-to-Many Record Linkage using One-Class Clustering Tree
%J International Conference on Simulations in Computing Nexus
%@ 0975-8887
%V ICSCN
%N 2
%P 23-26
%D 2014
%I International Journal of Computer Applications
Abstract

Record linkage is traditionally performed among the entities of same type. It can be done based on entities that may or may not share a common identifier. In this paper we propose a new linkage method that performs linkage between matching entities of different data types as well. The proposed technique is based on one-class clustering tree that characterizes the entities which are to be linked. The tree is built in such a way that it is easy to understand and can be transformed into association rules. The inner nodes of the tree consist of features of the first set of entities. The leaves of the tree represent features of the second set that are matching. The data is split using two splitting criteria. Also two pruning methods are used for creating one-class clustering tree. The proposed system results better in performance of precision and recall.

References
  1. M. Dror, A. Shabtai, L. Rokach, Y. Elovici, "OCCT: A One-Class Clustering Tree for Implementing One-to- Many Data Linkage," IEEE Trans. on Knowledge and Data Engineering, TKDE-2011-09-0577, 2013.
  2. M. Yakout, A. K. Elmagarmid, H. Elmeleegy, M. Quzzani and A. Qi, "Behavior Based Record Linkage," in Proc. of the VLDB Endowment, vol. 3, no 1-2, pp. 439-448, 2010.
  3. A. J. Storkey, C. K. I. Williams, E. Taylorand R. G. Mann, "An Expectation Maximisation Algorithm for One-to- Many Record Linkage," University of Edinburgh Informatics Research Report, 2005.
  4. S. Ivie, G. Henry, H. Gatrell and C. Giraud-Carrier, "A Metric Based Machine Learning Approach to Genea- Logical Record Linkage," in Proc. of the 7th Annual Workshop on Technology for Family History and Genealogical Research, 2007.
  5. P. Christen and K. Goiser, "Towards Automated Data Linkage and Deduplication," Australian National University, Technical Report, 2005.
  6. P. Langley, Elements of Machine Learning, San Franc- Isco, Morgan Kaufmann, 1996.
  7. S. Guha, R. Rastogi and K. Shim, "Rock: A Robust Clustering Algorithm for Categorical Attributes," Informat- ion Systems, vol. 25, no. 5, pp. 345-366, July 2000.
  8. D. D. Dorfmann and E. Alf, "Maximum-Likelihood EstiMation of Parameters of Signal-Detection Theory and Determination of Confidence Intervals-Rating-Method Data," Journal of Math Psychology, vol. 6, no. 3, pp. 487-496, 1969.
  9. A. Gershman et al. , "A Decision Tree Based Recomme- nder System," in Proc. the 10th Int. Conf. on Innovative Internet Community Services, pp. 170-179, 2010.
  10. J. R. Quinlan, "Induction of Decision Trees," Machine Learning, vol. 1, no. 1, pp. 81-106, March 1986.
Index Terms

Computer Science
Information Sciences

Keywords

Linkage Clustering Splitting Decision Tree