CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

An Improved Method to Identify Exact and Approximate Tandem Repeats in DNA Sequences using Biclustering

by Pamela Vinitha Eric, Kusum Rajput, Gopakumar G.
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 146 - Number 9
Year of Publication: 2016
Authors: Pamela Vinitha Eric, Kusum Rajput, Gopakumar G.
10.5120/ijca2016910851

Pamela Vinitha Eric, Kusum Rajput, Gopakumar G. . An Improved Method to Identify Exact and Approximate Tandem Repeats in DNA Sequences using Biclustering. International Journal of Computer Applications. 146, 9 ( Jul 2016), 1-5. DOI=10.5120/ijca2016910851

@article{ 10.5120/ijca2016910851,
author = { Pamela Vinitha Eric, Kusum Rajput, Gopakumar G. },
title = { An Improved Method to Identify Exact and Approximate Tandem Repeats in DNA Sequences using Biclustering },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2016 },
volume = { 146 },
number = { 9 },
month = { Jul },
year = { 2016 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume146/number9/25423-2016910851/ },
doi = { 10.5120/ijca2016910851 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:49:55.601756+05:30
%A Pamela Vinitha Eric
%A Kusum Rajput
%A Gopakumar G.
%T An Improved Method to Identify Exact and Approximate Tandem Repeats in DNA Sequences using Biclustering
%J International Journal of Computer Applications
%@ 0975-8887
%V 146
%N 9
%P 1-5
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Tandem repeats occur frequently in eukaryotic and prokaryotic genomic sequences. They are associated with several inherited human diseases, DNA fingerprinting, evolution and regulatory processes. In spite of their importance, detection of tandem repeats is still not resolved in the sense that the current existing detection tools do not give the same results for a given input sequence. This is mainly due to the differences in the methods adopted by the search algorithms and the different parameter settings needed when they are executed. This paper proposes an efficient method to identify all exact and approximate tandem repeats within a given DNA sequence and also identifies the presence of any changes brought about by mutation. The method first identifies all potential tandem repeats by clustering using K-means method, followed by biclustering to filter out the actual repeats along with the position of occurrance of approximate tandem repeats. The results obtained by this method are consistent with that of existing methods.

References
  1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHughW, Funke R. ”Initial sequencing and analysis of the human genome”. Nature. 2001 Feb 15;409(6822):860-921.
  2. Richard, Guy-Franck, Alix Kerrest, and Bernard Dujon. ”Comparative genomics and molecular dynamics of DNA repeats in eukaryotes”. Microbiology and Molecular Biology Reviews 72.4 (2008): 686-727.
  3. Richards, R. I., Holman, K., Yu, S., Sutherland, G. R. (1993). ”Fragile X syndrome unstable element, p (CCG) n, and other simple tandem repeat sequences are binding sites for specific nuclear proteins”. Human molecular genetics, 2(9), 1429- 1435
  4. Leibovitch, Boris A., Quinn Lu, Lawrence R. Benjamin, Yingyun Liu, David S. Gilmour, and Sarah CR Elgin. ”GAGA factor and the TFIID complex collaborate in generating an open chromatin structure at the Drosophila melanogaster hsp26 promoter”. Molecular and cellular biology 22, no. 17 (2002): 6148-6157.
  5. MacDonald ME, Ambrose CM, Duyao MP, Myers RH, Lin C, Srinidhi L, Barnes G, Taylor SA, James M, Groot N, Mac- Farlane H. ”A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes”. Cell. 1993 Mar 26;72(6):971-83.
  6. Verkerk, Annemiske JMH, Maura Pieretti, James S. Sutcliffe, Ying-Hui Fu, Derek PA Kuhl, Antonio Pizzuti, Orly Reiner et al. ”Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome”. Cell 65, no. 5 (1991): 905-914.
  7. Fu, Y. H., A. Pizzuti, R1G1 Fenwick, J. King, S. Rajnarayan, P. W. Dunne, J. Dubel, G A. Nasser, T. Ashizawa, and P. De Jong. ”An unstable triplet repeat in a gene related to myotonic muscular dystrophy”. science 255, no. 5049 (1992): 1256- 1258.
  8. La Spada, Albert R., Elizabeth M.Wilson, Dennis B. Lubahn, A. E. Harding, and Kenneth H. Fischbeck. ”Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy”. Nature 352, no. 6330 (1991): 77-79.
  9. Campuzano V, Montermini L, Molt MD, Pianese L, Cosse M, Cavalcanti F, Monros E, Rodius F, Duclos F, Monticelli A, Zara F. ”Friedreich’s ataxia: autosomal recessive disease caused by an intronic GAA triplet repeat expansion”. Science. 1996 Mar 8;271(5254):1423-7.
  10. Manasatienkij, Chairat, and Chatchawin Ra-Ngabpai. ”Clinical application of forensic DNA analysis: a literature review”. Journal of the Medical Association of Thailand= Chotmaihet thangphaet 95.10 (2012): 1357-1363.
  11. Xia X, Rui R, Quan S, Zhong R, Zou L, Lou J, Lu X, Ke J, Zhang T, Zhang Y, Liu L. ”MNS16A tandem repeats minisatellite of human telomerase gene and cancer risk: a metaanalysis”. PloS one. 2013 Aug 22;8(8):e73367.
  12. Schaper, Elke, Andrey V. Kajava, Alain Hauser, and Maria Anisimova. ”Repeat or not repeat?statistical validation of tandem repeat prediction in genomic sequences”. Nucleic acids research 40, no. 20 (2012): 10005-10017.
  13. Elmasri, Ramez. ”Fundamentals of database systems”. Pearson Education India, 2008.
  14. Cormen, Thomas H., Charles E. Leiserson, and Ronald L. Rivest. C. Stein ”Introduction to Algorithms”. MIT Press 5.3 (2001): 55.
  15. Cheng, Yizong, and George M. Church. ”Biclustering of expression data”. Ismb. Vol. 8. 2000.
  16. Kaiser, Sebastian, and Friedrich Leisch. ”A toolbox for bicluster analysis in R”. (2008).
  17. Benson, Gary. ”Tandem repeats finder: a program to analyze DNA sequences”. Nucleic acids research 27.2 (1999): 573.
  18. Lim, Kian Guan, Chee Keong Kwoh, Li Yang Hsu, and Adrianto Wirawan. ”Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance”. Briefings in bioinformatics 14, no. 1 (2013): 67-81.
  19. Pokrzywa, Rafal, and Andrzej Polanski. ”BWtrs: a tool for searching for tandem repeats in DNA sequences based on the BurrowsWheeler transform”. Genomics 96.5 (2010): 316- 321.
  20. Kolpakov, Roman, Ghizlane Bana, and Gregory Kucherov. ”mreps: efficient and flexible detection of tandem repeats in DNA”. Nucleic acids research 31.13 (2003): 3672-3678.
  21. Li, Qiwei, Xiaodan Fan, and Tong Liang. ”An MCMC algorithm for detecting short adjacent repeats shared by multiple sequences”. Bioinformatics 27.13 (2011): 1772-1779.
  22. Jorda, Julien, and Andrey V. Kajava. ”T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm”. Bioinformatics 25.20 (2009): 2632-2638.
  23. Smit, Arian FA, Robert Hubley, and P. Green. ”Repeat- Masker.” Published on the web at http://www. repeatmasker. org (1996).
  24. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J.” Repbase Update, a database of eukaryotic repetitive elements”. Cytogenetic and genome research. 2005 Jul 21;110(1-4):462-7.
  25. Abajian, Chris. ”Sputnik: DNA microsatellite repeat search utility”. Program available at: http://epressoftware. com/pages/sputnik. jsp (1994).
  26. Delgrange, Olivier, and Eric Rivals. ”STAR: an algorithm to search for tandem approximate repeats”. Bioinformatics 20.16 (2004): 2812-2820.
Index Terms

Computer Science
Information Sciences

Keywords

Tandem Repeat DNA Sequence Micro Satellites Mini Satellites Clustering