Efficient Algorithm for Extracting Complete Repeats from Biological Sequences

Munina Yusufu; Gulina Yusufu

Call for Paper

June Edition

IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper

Know more

The week's pick

Enhancing Privacy Preservation: Multi-Attribute Protection with P-Sensitive K-Anonymity

Twinkle Patel Kiran Amin

Random Articles

Machine Flow based Energy-Power Approximation on Elastic Cloud Services

October

2015

Application of Business Intelligence using Machine Learning Approach

May

2017

Deploying Technology-Enhanced Learning Environments in Tanzanian Secondary Schools

November

2013

Selection of Optimum Reference Frame for the Field Oriented Control of an Induction Motor

Aug

2016

Reseach Article

Efficient Algorithm for Extracting Complete Repeats from Biological Sequences

by Munina Yusufu, Gulina Yusufu

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 128 - Number 16

Year of Publication: 2015

Authors: Munina Yusufu, Gulina Yusufu

10.5120/ijca2015906752

Munina Yusufu, Gulina Yusufu . Efficient Algorithm for Extracting Complete Repeats from Biological Sequences. International Journal of Computer Applications. 128, 16 ( October 2015), 33-37. DOI=10.5120/ijca2015906752

@article{ 10.5120/ijca2015906752,

author = { Munina Yusufu, Gulina Yusufu },

title = { Efficient Algorithm for Extracting Complete Repeats from Biological Sequences },

journal = { International Journal of Computer Applications },

issue_date = { October 2015 },

volume = { 128 },

number = { 16 },

month = { October },

year = { 2015 },

issn = { 0975-8887 },

pages = { 33-37 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume128/number16/22960-2015906752/ },

doi = { 10.5120/ijca2015906752 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:21:54.760501+05:30

%A Munina Yusufu

%A Gulina Yusufu

%T Efficient Algorithm for Extracting Complete Repeats from Biological Sequences

%J International Journal of Computer Applications

%@ 0975-8887

%V 128

%N 16

%P 33-37

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In this paper, an approach for efficiently extracting the repeating patterns in a biological sequence is proposed. A repeating pattern is a subsequence which appears more than once in a sequence, which is one of the most important features that can be used for revealing functional or evolutionary relationships in biological sequences. The algorithm does a rapid scan of the string to find repeating regions where the repeating substring has been marked using length, occurrence positions, and occurrence frequency. The algorithm execute in linear time and space independent of alphabet size. The algorithm also has the capability to restrict output complete repeats in which length (period) p ≥ pmin, where pmin ≥ 1 is a user-specified minimum. The algorithm outputs complete repeats, and can be extended or applied to other situations, for example computing maximal repeats, or finding common motifs in a set of biological sequences.

References

Lander E.S, Linton L.M, Birren B, et al.. Initial Sequencing and Analysis of the Human Genome, Nature, 2001, 409(6822): 860-921.
Treangen TJ, Salzberg SL. Repetitive DNA and next-generation sequencing: Computational challenges and solutions, Nature Reviews Genetics, 2011, 13 (1): 36-46.
Stefan Kurtz, Jomuna V. Choudhuri, Enno Ohlebusch, Chris Schleiermacher, Jens Stoye, Robert Giegerich. REPuter: The manifold applications of repeat analysis on a genomic scale, Nucleic Acids Research, 2001, 29(22):4633-4642.
Makalowski W. Not junk after all, Science, 2003, 300(5623): 1246-1247.
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature, 2004, 431: 931-945.
Verkerk A., Pieretti M., Sutcliffe J., Fu Y., Kul D., Pizzuti A., Refiner O., et al.. Identification of gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome, Cell, 1991, 65: 905-914.
Huntington's Disease Collaborative Research Group. A novel gene containing a trinucleotide repeat that is expanded an unstable on Huntington's disease chromosomes, Cell, 1993, 72: 971-983.
Campuzano V., Montermini L., Molto M.D., Pianese L. Cossee M, et al.. Friedreich's ataxia: autosomal recessive disease caused by an intronic GAA triplet repeat expansion, Science, 1996, 271(5254):1423-1427.
E. Eskin, P. A. Pevzner. Finding Composite Regulatory Patterns in DNA Sequences. Bioinformatics, 2002, 18 Suppl 1:S354-363.
Sagot, MF. Spelling Approximate Repeated or Common Motifs Using a Suffix Tree. Lecture Notes in Computer Science, 1998, 1380:111-127.
A.F.A. Smit and P. Green. REPEATMASKER. Available at http://www.repeatmasker.org/
Dan Gusfield. Algorithms on Strings, Trees and Sequences, Cambridge University Press, 1997.
Stefan Kurtz, Chris Schleiermacher. REPuter: Fast computation of maximal repeats in complete genomes, Bioinformatics, 1999, 15(5):426-427.
A. L. Delcher et al. Alignment of whole genomes, Nucleic Acids Research, 1999, 27:2369-2376.
Gary Benson. Tandem repeats finder: A program to analyze DNA sequences, Nucleic Acids Research, 1999, 27(2):573-580.
Frantisek Franek, William F. Smyth, Yudong Tang. Computing all repeats using suffix arrays, Journal of Automata, Languages and Combinatorics, 2003, 8(4): 579-591.
Kaziyuki Narisawa, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Efficient computation of substring equivalence classes with suffix arrays, Proc. of 18th CPM, 2007, 340-351.
Simon J. Puglisi, W. F. Smyth, Munina Yusufu. Fast, Practical Algorithms for Computing All the Repeats in a String, Mathematics in Computer Science, 2010, 3(4):371-496.
Albert A. Conti, Tom Van Court, Martin C. Herbordt. Processing Repetitive Sequence Structures with Mismatches at Streaming Rate. Lecture Notes in Computer Science, 2004, 3203:1080-1083.
Juha Karkkainen, Peter Sanders. Simple linear work suffix array construction, Proc. of 30th ICALP, LNCS 2719, 2003, 943-955.
Kasai, G. Lee, H. Arimura, S. Arikawa, K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications, Proc. of 12th CPM, LNCS 2089, 2001, 181-192.
Nizar R. Mabroukeh, C. I. Ezeife. A Taxonomy of Sequential Pattern Mining Algorithms. ACM Computing Surveys, 2010, 43(1):3:1-3:41.
Anisa Al-Hafeedh, Maxime Crochemore, Lucian Ilie, Evguenia Kopylova, W.F. Smyth, German Tischler, Munina Yusufu. A comparison of index-based Lempel- Ziv LZ77 factorization algorithms. ACM Computing Surveys, 2012, 45(1):5:1-5:17.

Index Terms

Computer Science

Information Sciences

Keywords

Complete repeats Biological sequence Suffix array Motif finding