CFP last date
20 May 2024
Reseach Article

The Weighted Factors Automaton : A Tool for DNA Sequences Analysis

by Christiane Hespel, Farida Benmakrouha, Danielle Quichaud
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 70 - Number 4
Year of Publication: 2013
Authors: Christiane Hespel, Farida Benmakrouha, Danielle Quichaud
10.5120/11947-7763

Christiane Hespel, Farida Benmakrouha, Danielle Quichaud . The Weighted Factors Automaton : A Tool for DNA Sequences Analysis. International Journal of Computer Applications. 70, 4 ( May 2013), 1-7. DOI=10.5120/11947-7763

@article{ 10.5120/11947-7763,
author = { Christiane Hespel, Farida Benmakrouha, Danielle Quichaud },
title = { The Weighted Factors Automaton : A Tool for DNA Sequences Analysis },
journal = { International Journal of Computer Applications },
issue_date = { May 2013 },
volume = { 70 },
number = { 4 },
month = { May },
year = { 2013 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume70/number4/11947-7763/ },
doi = { 10.5120/11947-7763 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:31:56.817186+05:30
%A Christiane Hespel
%A Farida Benmakrouha
%A Danielle Quichaud
%T The Weighted Factors Automaton : A Tool for DNA Sequences Analysis
%J International Journal of Computer Applications
%@ 0975-8887
%V 70
%N 4
%P 1-7
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A lot of computing tools are often used for analyzing DNA sequences like trees, automata, dictionaries, every one being reserved for a particular problem. A. Blumer and al. have proposed a more general computing tool : the smaller automaton recognizing the subwords of a text (DAWG). In this paper we propose the concept of "weighted factors automaton" producing every occurrence of any factor. Its transitions are labeled by the read letter and weighted by the set of the indices of the factors beginnings. The factors are obtained by concatenating the read letters and the indices of the factors beginnings are obtained by computing the intersection of the weighting sets, when advancing from the initial state to a final state. We think that this automaton can be more easily processed than DAWG and we present a comparison between DAWG and our automaton: the set of the factors beginnings indices and the factors frequency are more easily obtained by our automaton and the restriction of our automaton to the factors of length k maintains the automaton structure, when DAWG cannot be easily restricted. The applications are numerous: By selecting factors of length 1, we obtain the coding regions, factors of length 3, we obtain the expression level of some gene. The "weighted factors automaton" allows us to find matches of pattern, to study homology, FASTA and BLAST algorithms being significantly simplified

References
  1. J. BERSTEL, C. REUTENAUER, Rational series and their languages. Springer-Verlag, 1988.
  2. A. BLUMER, J. BLUMER, D. HAUSSLER, A. EHRENFEUCHT, M. T. CHEN, J. DEIFERAS, The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci. 40, 1985, 31–35.
  3. M. CROCHEMORE, Transducers and repetitions. Theor. Comput. Sci. 45, 1986, 63–86.
  4. M. CROCHEMORE, C. HANCART, T. LECROQ, Algorithms on strings. Cambridge University Press, 2007.
  5. F. DARDEL, F. K´EP`E S, Bioinformatique, G´enomique et post-g´enomique. Editions de l'Ecole Polytechnique, 2002.
  6. S. FARO, T. LECROQ, A Fast Suffix Automata Based Algorithm for Exact Online String Matching. LNCS 7276, Springer-Verlag, to appear, 149–158.
  7. P. FLAJOLET, R. SEDGEWICK, Introduction `a l'analyse des algorithmes. International Thomson Publishing, 1998.
  8. M. FLIESS, Un outil alg´ebrique : les s´eries formelles non commutatives. In: G. MARCHESINI, S. K. MITTER (ed. ), Mathematical System Theory. LNEMS 131, Springer-Verlag, 1976, 122–148.
  9. D. GUSFIELD, Algorithms on Strings, Trees and Sequences. Cambridge University Press, 1997.
  10. D. L. HARTL, E. W. JONES, G´en´etique, Les grands principes. Dunod, 2003.
  11. C. HESPEL, Une ´etude des s´eries formelles non commutatives pour l'Approximation et l'Identification des syst`emes dynamiques. Th`ese d'´etat, Univ. of Lille 1, 1998.
  12. C. N. JONES, A. P. PEVZNER, An introduction to Bioinformatics Algorithms. MIT Press, 2004.
  13. A. LEFEBVRE, T. LECROQ, H. DAUCHEL, J. ALEXANDRE, FORRepeats Bioinformatics (2003).
Index Terms

Computer Science
Information Sciences

Keywords

Algorithms on Strings Weighted automata Bioinformatics DNA Sequences Analysis DNA Sequences Biasis