CFP last date
20 May 2024
Reseach Article

Review on Record LINKAGE and Deduplication based on Suffix Array Indexing

by Warke Yamini, Arti Mohanpurkar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 108 - Number 6
Year of Publication: 2014
Authors: Warke Yamini, Arti Mohanpurkar
10.5120/18916-0243

Warke Yamini, Arti Mohanpurkar . Review on Record LINKAGE and Deduplication based on Suffix Array Indexing. International Journal of Computer Applications. 108, 6 ( December 2014), 28-30. DOI=10.5120/18916-0243

@article{ 10.5120/18916-0243,
author = { Warke Yamini, Arti Mohanpurkar },
title = { Review on Record LINKAGE and Deduplication based on Suffix Array Indexing },
journal = { International Journal of Computer Applications },
issue_date = { December 2014 },
volume = { 108 },
number = { 6 },
month = { December },
year = { 2014 },
issn = { 0975-8887 },
pages = { 28-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume108/number6/18916-0243/ },
doi = { 10.5120/18916-0243 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:42:16.968756+05:30
%A Warke Yamini
%A Arti Mohanpurkar
%T Review on Record LINKAGE and Deduplication based on Suffix Array Indexing
%J International Journal of Computer Applications
%@ 0975-8887
%V 108
%N 6
%P 28-30
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Record linkage is a momentous process in data soundness which is used in combining, matching and duplicate removal from more than two databases that refer to the same entities. Deduplication is the process of taking off duplicate records in a united database. Now a day, data cleaning and standardization becomes a pompous process. Due to yielding capacity of today's database, discovering matching records in united database is a crucial one. Indexing technique specifically suffix array is used to efficiently implement record linkage and deduplication.

References
  1. "Winkler, William E. "Overview of record linkage and current research directions. "US Bureau of the Census. 2006. ," Tech. Rep. RR2006/02, 2006.
  2. Vladu, Adrian, and Cosmin Negru?eri. "Suffix arrays–a programming contest approach. " (2005).
  3. Gog, Simon, Alistair Moffat, J. Culpepper, Andrew Turpin, and Anthony Wirth. "Large-scale pattern search using reduced-space on-disk suffix arrays. " IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 8, AUGUST 2014
  4. Christen, Peter. "A survey of indexing techniques for scalable record linkage and deduplication. " Knowledge and Data Engineering, IEEE Transactions on 24. 9 (2012): 1537-1555.
  5. P. Christen, "A comparison of personal name matching: Techniques and practical issues," in Workshop on Mining Complex Data, held at IEEE ICDM'06, Hong Kong, 2006.
  6. Christen, P. , Churches, T. , & Hegland, M. (2004). Febrl–a parallel open source data linkage system. In Advances in knowledge discovery and data mining (pp. 638-647). Springer Berlin Heidelberg
  7. Clark, D. E. (2004). Practical introduction to record linkage for injury research. Injury Prevention, 10(3), 186-191.
  8. Rahm, E. , & Do, H. H. (2000). Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. , 23(4), 3-13.
  9. Churches, Tim, et al. "Preparation of name and address data for record linkage using hidden Markov models. " BMC Medical Informatics and Decision Making 2. 1 (2002): 9.
  10. Christen, Peter, and Karl Goiser. "Quality and complexity measures for data linkage and deduplication. " Quality Measures in Data Mining. Springer Berlin Heidelberg, 2007. 127-151.
  11. L. Gu and R. Baxter, "Decision models for record linkage," in Selected Papers from AusDM, Springer LNCS 3755, 2006
  12. Su, Weifeng, Jiying Wang, and Frederick H. Lochovsky. "Record matching over query results from multiple web databases. " Knowledge and Data Engineering, IEEE Transactions on 22. 4 (2010): 578-589.
  13. Dey, Debabrata, Vijay S. Mookerjee, and Dengpan Liu. "Efficient techniques for online record linkage. " Knowledge and Data Engineering, IEEE Transactions on 23. 3 (2011): 373-387. .
  14. Bernecker, Thomas, et al. "Scalable probabilistic similarity ranking in uncertain databases. " Knowledge and Data Engineering, IEEE Transactions on 22. 9 (2010): 1234-1246
  15. Bilenko, Mikhail, Beena Kamath, and Raymond J. Mooney. "Adaptive blocking: Learning to scale up record linkage. " Data Mining, 2006. ICDM'06. Sixth International Conference on. IEEE, 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Record linkage suffix array blocking