CFP last date
22 July 2024
Reseach Article

Development of Replica Free Repositories using Particle Swarm Optimization Algorithm

by Jeby K Luthiya, C. Umamaheswari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 66 - Number 20
Year of Publication: 2013
Authors: Jeby K Luthiya, C. Umamaheswari

Jeby K Luthiya, C. Umamaheswari . Development of Replica Free Repositories using Particle Swarm Optimization Algorithm. International Journal of Computer Applications. 66, 20 ( March 2013), 8-13. DOI=10.5120/11198-6213

@article{ 10.5120/11198-6213,
author = { Jeby K Luthiya, C. Umamaheswari },
title = { Development of Replica Free Repositories using Particle Swarm Optimization Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { March 2013 },
volume = { 66 },
number = { 20 },
month = { March },
year = { 2013 },
issn = { 0975-8887 },
pages = { 8-13 },
numpages = {9},
url = { },
doi = { 10.5120/11198-6213 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T21:22:55.692380+05:30
%A Jeby K Luthiya
%A C. Umamaheswari
%T Development of Replica Free Repositories using Particle Swarm Optimization Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 66
%N 20
%P 8-13
%D 2013
%I Foundation of Computer Science (FCS), NY, USA

The increasing volume of information available in digital media becomes a challenging problem for data administrators. Usually built on data gathered from different sources, data repositories such as those used by digital libraries and e-commerce brokers present records with disparate schemata and structures. The increased volume even created redundant data also in the database. So a system or method is become immense to control the redundancy and duplication. In the proposed approach, a method that makes use of PSO (Particle Swarm Optimization) algorithm for generating the optimal similarity measure to decide whether the data is duplicate or not. PSO algorithm is used to generate the optimal similarity measure for the training datasets. Once the optimal similarity measure obtained, the deduplication of remaining datasets is done with the help of optimal similarity measure generated from the PSO algorithm.

  1. Moises G. de Carvalho, Alberto H. F. Laender, Marcos Andre Goncalves, Altigran S. da Silva, "A Genetic Programming Approach to Record Deduplication", IEEE Transaction on Knowledge and Data Engineering,pp 399-412, 2011.
  2. LuísLeitão and PávelCalado, "Duplicate detection through structure optimization", ACM International conference on Information and knowledge management, pp: 443-452, 2011.
  3. Ektefa, M, Sidi. F,Ibrahim. H, Jabar. M. A. , Memar. S, Ramli. A, "A threshold-based similarity measure for duplicate detection ", IEEE conference on Open systems, pp: 37-41, 2011.
  4. Elhadi. M, Al-Tobi. A, "Duplicate Detection in Documents and WebPages Using Improved Longest Common Subsequence and Documents Syntactical Structures", International Conference on Computer Sciences and Convergence Information Technology,pp: 679-684,2009.
  5. Ye Qingwei, WuDongxing, Zhou Yu, Wang Xiaodong, " The duplicated of partial content detection based on PSO ", IEEE FifthInternational Conference on Bio-Inspired Computing: Theories and Applications, pp: 350 - 353, 2010.
  6. J Prasanna Kumar, and P Govindarajulu. "Duplicate and Near Duplicate Documents Detection: A Review". European Journal of Scientific Research, vol. 32, pp: 514-527, 2009.
  7. Dutch T. Meyer and William J. Bolosky, "A Study of Practical Deduplication", Computer and Information Science,pp: 1-13, 2011.
  8. Danny Harnik, Benny Pinkas, Alexandra Shulman-Peleg "Side channels in cloud services, the case of deduplication in cloud storage", vol. 8, no. 6, pp: 40-47, 2010.
  9. Yujuan Tan, Hong Jiang, Dan Feng, Lei Tian, Zhichao Yan, Guohui Zhou, " SAM: A Semantic-AwareMulti-Tiered Source De-duplication Framework for Cloud Backup", International Conference on Parallel Processing (ICPP), pp: 614-623, 2010.
  10. N. Koudas, S. Sarawagi, and D. Srivastava, "Record linkage: similarity measures and algorithms," in Proceedings of the2006 ACM SIGMOD International Conference on Management of Data, pp. 802–803, 2006.
  11. C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian, P. Strzelczak, J. Szczepkowski, C. Ungureanu, and M. Welnicki. Hydrastor: a scalable secondary storage. In Proc. 7th USENIX Conference on File and Storage Technologies, 2009.
  12. C. Ungureanu, B. Atkin, A. Aranya, S. Gokhale, S. Rago, G. Cakowski, C. Dubnicki, and A. Bohra. Hydrafs: A high-throughputfile system for the Hydrastor content-addressable storage system. In Proc. 8th USENIX Conference on File and Storage Technologies, 2010.
  13. W. Bolosky, S. Corbin, D. Goebel and J. Douceur. Single instance storage in Windows 2000. In Proc. 4th USENIX WindowsSystems Symposium, 2000.
  14. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, "Duplicate Record Detection: A Survey," IEEE Trans. Knowledge and Data Eng. , vol. 19, no. 1, pp. 1-16, Jan. 2007.
  15. S. Dorward and S. Quinlan. Venti: A new approach to archival data storage. In Proc. 1st USENIX Conference on File andStorage Technologies, 2002.
  16. P. Christen, "Probabilistic Data Generation for Deduplication and Data Linkage," Intelligent Data Eng. and Automated Learning, pp. 109-116, Springer, 2005.
  17. Fellegi, I. and Sunter, A. : A theory for record linkage. Journal of the American Statistical Society, December 1969.
  18. Sarawagi, S. and Bhamidipaty, A. : Interactive deduplication using active learning. Proceedings of the 8th ACM SIGKDD conference, Edmonton, July 2002.
Index Terms

Computer Science
Information Sciences


PSO Algorithm Genetic Algorithm Database administration Evolutionary computing Database integration