CFP last date
20 May 2024
Reseach Article

A Survey of Software Clone Detection Techniques

by Abdullah Sheneamer, Jugal Kalita
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 137 - Number 10
Year of Publication: 2016
Authors: Abdullah Sheneamer, Jugal Kalita
10.5120/ijca2016908896

Abdullah Sheneamer, Jugal Kalita . A Survey of Software Clone Detection Techniques. International Journal of Computer Applications. 137, 10 ( March 2016), 1-21. DOI=10.5120/ijca2016908896

@article{ 10.5120/ijca2016908896,
author = { Abdullah Sheneamer, Jugal Kalita },
title = { A Survey of Software Clone Detection Techniques },
journal = { International Journal of Computer Applications },
issue_date = { March 2016 },
volume = { 137 },
number = { 10 },
month = { March },
year = { 2016 },
issn = { 0975-8887 },
pages = { 1-21 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume137/number10/24308-2016908896/ },
doi = { 10.5120/ijca2016908896 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:37:58.791781+05:30
%A Abdullah Sheneamer
%A Jugal Kalita
%T A Survey of Software Clone Detection Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 137
%N 10
%P 1-21
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

If two fragments of source code are identical or similar to each other, they are called code clones. Code clones introduce difficulties in software maintenance and cause bug propagation. Software clones occur due to several reasons such as code reuse by copying pre-existing fragments, coding style, and repeated computation using duplicated functions with slight changes in variables or data structures used. If a code fragment is edited, it will have to be checked against all related code clones to see if they need to be modified as well. Removal, avoidance or refactoring of cloned code are other important issues in software maintenance. However, several research studies have demonstrated that removal or refactoring of cloned code is sometimes harmful. In this study, code clones, common types of clones, phases of clone detection, the state-ofthe- art in code clone detection techniques and tools, and challenges faced by clone detection techniques are discussed.

References
  1. Ratten, Dhavleesh,Rajesh Bhatia, and Maninder Singh. Software clone detection: A systematic review. Information and Software Technology 55.7 (2013): 1165-1199.
  2. Yang, Jiachen, et al. Classification model for code clones based on machine learning. Empirical Software Engineering (2014): 1-31.
  3. Walenstein, Andrew, and Arun Lakhotia. The software similarity problem in malware analysis. Internat. Begegnungs-und Forschungszentrum fr Informatik, 2007.
  4. Kamiya, Toshihiro, Shinji Kusumoto, and Katsuro Inoue. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. Software Engineering, IEEE Transactions on 28.7 (2002): 654-670.
  5. Chen,Wen-Ke, Bengu Li, and Rajiv Gupta. Code compaction of matching single-entry multiple-exit regions. Static Analysis. Springer Berlin Heidelberg, 2003. 401-417.
  6. Bruntink, Magiel, et al. On the use of clone detection for identifying crosscutting concern code. Software Engineering, IEEE Transactions on 31.10 (2005): 804-818.
  7. Li, Zhenmin, et al. CP-Miner: Finding copy-paste and related bugs in large-scale software code. Software Engineering, IEEE Transactions on 32.3 (2006): 176-192.
  8. Baker, Brenda S. On finding duplication and near-duplication in large software systems. refeverse Engineering, 1995., Proceedings of 2nd Working Conference on. IEEE, 1995.
  9. Yuan, Yang, and Yao Guo. Boreas: an accurate and scalable token-based approach to code clone detection. Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. ACM, 2012.
  10. Kim, Miryung, et al. An empirical study of code clone genealogies. ACM SIGSOFT Software Engineering Notes. Vol. 30. No. 5. ACM, 2005.
  11. Roy, Chanchal Kumar, and James R. Cordy. NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalizationProgram Comprehension, 2008. ICPC 2008. The 16th IEEE International Conference on. IEEE, 2008.
  12. Lee, Seunghak, and Iryoung Jeong. SDD: high performance code clone detection system for large scale source code. Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications. ACM, 2005.
  13. Baxter, Ira D., et al. Clone detection using abstract syntax trees. Software Maintenance, 1998. Proceedings., International Conference on. IEEE, 1998.
  14. Koschke,Rainer,Raimar Falke, and Pierre Frenzel. Clone detection using abstract syntax suffix trees. Reverse Engineering, 2006. WCRE’06. 13th Working Conference on. IEEE, 2006.
  15. S. Ducasse, M. Rieger and S. Demeyer, A Language Independent Approach for Detecting Duplicated Code, Proc. Int’,l Conf. Software Maintenance, pp. 109-118, 1999.
  16. Roy, C. K., and Cordy, J. R., A mutation / injection-based automatic framework for evaluating code clone detection tools , in Proc. The IEEE International Conference on Software Testing, Verification, and Validation Workshops , 2009, pp. 157-166.
  17. Funaro, Marco, et al. A hybrid approach (syntactic and textual) to clone detection. Proceedings of the 4th International Workshop on Software Clones. ACM, 2010.
  18. Agrawal, Akshat, and Sumit Kumar Yadav. A hybrid-token and textual based approach to find similar code segments. 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT). IEEE, 2013.
  19. E. Kodhai, S. Kanmani, A. Kamatchi,R. Radhika, and B. Vijaya saranya, Detection of type-1 and type-2 code clones using textual analysis and metrics, Proc. Int. Conf. on Recent Trends in Information, Telecommunication and Computing, 2010, pp. 241-243.
  20. J. Mayrand, C. Leblanc and E. Merlo, Experiment on the automatic detection of function clones in a software system using metrics, Proc. Int. Conf. on Software Maintenance, 1996, pp. 244-253.
  21. E. Merlo, detection of plagiarism in university projects using metrics-based spectral similarity, Proc. Dagstuhl Seminar 06301: Duplication,Redundancy, and Similarity in Software, 2006.
  22. R. Komondoor and S. Horwitz. Using Slicing to Identify Duplication in Source Code. In SAS, pp. 40-56, 2001.
  23. Gabel, Mark, Lingxiao Jiang, and Zhendong Su. Scalable detection of semantic clones. Software Engineering, 2008. ICSE’08. ACM/IEEE 30th International Conference on. IEEE, 2008.
  24. X. Yan, J. Han, and R. Afshar. Clospan: Mining closed sequential patterns in large datasets, 2003.
  25. A.V. Aho,R. Sethi, and J. Ullman, Compilers: Principles, Techniques and Tools. Addison-Wesley, 1986.
  26. Ducasse, Stphane, Oscar Nierstrasz, and Matthias Rieger. On the effectiveness of clone detection by string matching. Journal of Software Maintenance and Evolution: Research and Practice 18.1 (2006): 37-58.
  27. Roy, Chanchal K., James R. Cordy, and Rainer Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming 74.7 (2009): 470-495.
  28. Jiang, Lingxiao, et al. Deckard: Scalable and accurate treebased detection of code clones. Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, 2007.
  29. Raheja, Kanika, and Rajkumar Tekchandani. An emerging approach towards code clone detection: metric based approach on byte code. International Journal of Advanced Research in Computer Science and Software Engineering 3.5 (2013).
  30. Sharma, Yogita. Hybrid technique for object oriented software clone detection. Diss. THAPAR UNIVERSITY, 2011.
  31. Krinke, Jens. Identifying similar code with program dependence graphs. Reverse Engineering, 2001. Proceedings. Eighth Working Conference on. IEEE, 2001.
  32. Liu, Chao, et al. GPLAG: detection of software plagiarism by program dependence graph analysis. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006.
  33. Prechelt, Lutz, Guido Malpohl, and Michael Philippsen. Finding plagiarisms among a set of programs with JPlag. J. UCS 8.11 (2002): 1016.
  34. Wahler, Vera, et al. Clone Detection in Source Code by Frequent Itemset Techniques. SCAM. Vol. 4. 2004.
  35. Patenaude, J-F., et al. Extending software quality assessment techniques to java systems. Program Comprehension, 1999. Proceedings. Seventh International Workshop on. IEEE, 1999.
  36. Kontogiannis, K., M. Galler, and R. DeMori. Detecting code similarity using patterns. Working Notes of the Third Workshop on AI and Software Engineering: Breaking the Toy Mold (AISE). 1995.
  37. Kontogiannis, Kostas A., et al. Pattern matching for clone and concept detection. Reverse engineering. Springer US, 1996. 77-108.
  38. Weiser, Mark. Program slicing. Proceedings of the 5th international conference on Software engineering. IEEE Press, 1981.
  39. Hummel, Benjamin, et al. Index-based code clone detection: incremental, distributed, scalable. Software Maintenance (ICSM), 2010 IEEE International Conference on. IEEE, 2010.
  40. Kodhai, Vijayakumar, Balabaskaran, Stalin, and Kanagaraj, et al. Method Level Detection and Removal of Code Clones in C and Java Programs using Refactoring. In IJJCET, pp. 93-95, 2010.
  41. Choi, Eunjong, et al. Extracting code clones for refactoring using combinations of clone metrics. Proceedings of the 5th International Workshop on Software Clones. ACM, 2011.
  42. Juillerat, Nicolas, and Bat Hirsbrunner. An algorithm for detecting and removing clones in java code. Proceedings of the 3rd Workshop on Software Evolution through Transformations: Embracing the Change, SeTra. Vol. 2006. 2006.
  43. Lague, Bruno, et al. Assessing the benefits of incorporating function clone detection in a development process. Software Maintenance, 1997. Proceedings., International Conference on. IEEE, 1997.
  44. Yuan, Yang, and Yao Guo. CMCD: Count matrix based code clone detection. Software Engineering Conference (APSEC), 2011 18th Asia Pacific. IEEE, 2011.
  45. Higo, Yoshiki, K-I. Sawa, and Shinji Kusumoto. Problematic code clones identification using multiple detection results. Software Engineering Conference, 2009. APSEC’09. Asia- Pacific. IEEE, 2009.
  46. Deissenboeck, Florian, et al. Model clone detection in practice. Proceedings of the 4th International Workshop on Software Clones. ACM, 2010.
  47. Deissenboeck, Florian, et al. Clone detection in automotive model-based development. Proceedings of the 30th international conference on Software engineering. ACM, 2008.
  48. Abd-El-Hafiz, Salwa K. A metrics-based data mining approach for software clone detection. Computer Software and Applications Conference (COMPSAC), 2012 IEEE 36th Annual. IEEE, 2012.
  49. Higo, Yoshiki, et al. Incremental code clone detection: A PDG-based approach. Reverse Engineering (WCRE), 2011 18th Working Conference on. IEEE, 2011.
  50. Dean, Thomas R., et al. Agile parsing in TXL. Automated Software Engineering 10.4 (2003): 311-336.
  51. Cordy, James R. The TXL source transformation language. Science of Computer Programming 61.3 (2006): 190-210.
  52. Han, Jiawei. Data Mining: Concepts and Techniques. (2006).
  53. Burd, Elizabeth, and John Bailey. Evaluating clone detection tools for use during preventative maintenance. Source Code Analysis and Manipulation, 2002. Proceedings. Second IEEE International Workshop on. IEEE, 2002.
  54. Rysselberghe, Filip Van, and Serge Demeyer. Evaluating clone detection techniques from a refactoring perspective. Proceedings of the 19th IEEE international conference on Automated software engineering. IEEE Computer Society, 2004.
  55. Mayrand, Jean, Claude Leblanc, and Ettore M. Merlo. Experiment on the automatic detection of function clones in a software system using metrics. Software Maintenance 1996, Proceedings., International Conference on. IEEE, 1996.
  56. Schleimer, Saul, Daniel S. Wilkerson, and Alex Aiken. Winnowing: local algorithms for document fingerprinting. Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM, 2003.
  57. Cutting, Doug, and Jan Pedersen. Optimization for dynamic inverted index maintenance. Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1989.
  58. Arya, Sunil, et al. An optimal algorithm for approximate nearest neighbor searching. Proceedings of the fifth annual ACMSIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 1994.
  59. http://pages.cs.wisc.edu/ cs302/labs/EclipseTutorial/
  60. Baker, Brenda S. Parameterized pattern matching: Algorithms and applications. Journal of Computer and System Sciences 52.1 (1996): 28-42.
  61. Datar, Mayur, et al. Locality-sensitive hashing scheme based on p-stable distributions. Proceedings of the twentieth annual symposium on Computational geometry. ACM, 2004.
  62. Rivest,Ronald. The MD5 message-digest algorithm. (1992).
  63. Higo, Yoshiki, et al. On software maintenance process improvement based on code clone analysis. Product Focused Software Process Improvement. Springer Berlin Heidelberg, 2002. 185-197.
  64. Bellon, Stefan, et al. Comparison and evaluation of clone detection tools. Software Engineering, IEEE Transactions on 33.9 (2007): 577-591.
  65. Murakami, Hiroaki, et al. Folding repeated instructions for improving token-based code clone detection. Source Code Analysis and Manipulation (SCAM), 2012 IEEE 12th International Working Conference on. IEEE, 2012.
  66. Murakami, Hiroaki, et al. Gapped code clone detection with lightweight source code analysis. Program Comprehension (ICPC), 2013 IEEE 21st International Conference on. IEEE, 2013.
  67. Hotta, Keisuke, et al. How Accurate Is Coarse-grained Clone Detection?: Comparision with Fine-grained Detectors. Electronic Communications of the EASST 63 (2014).
  68. Smith, Temple F., and Michael S.Waterman. Identification of common molecular subsequences. Journal of molecular biology 147.1 (1981): 195-197.
  69. CCFinderX, http://www.ccfinder.net/.
  70. Akira Goto, Norihiro Yoshida, Masakazu Ioka, Eunjong Choi, and Katsuro Inoue. How to extract differences from similar programs? A cohesion metric approach. In Proceedings of the 7th International Workshop on Software Clones, 2013.
  71. Meng, N., Hua, L., Kim, M., McKinley, K. S. Does Automated Refactoring Obviate Systematic Editing?. UPDATE, 6, 7.
  72. Koschke, Rainer. Survey of research on software clones. Internat. Begegnungs-und Forschungszentrum fr Informatik, 2007.
  73. Arcelli Fontana, Francesca, et al. Software clone detection and refactoring. ISRN Software Engineering (2013).
  74. Roy, Chanchal Kumar, and James R. Cordy. A survey on software clone detection research. Technical Report 541, Queen’s University at Kingston, 2007.
  75. Shafieian, Saeed, and Ying Zou. Comparison of Clone Detection Techniques. Technical report, Queen?s University, Kingston, Canada, 2012.
  76. Dang, S. and Wani, S.A., Performance Evaluation of Clone Detection Tools.
Index Terms

Computer Science
Information Sciences

Keywords

Software Clone Code Clone Duplicated Code Detection Clone Detection