CFP last date
20 May 2024
Reseach Article

Discovering Sequence Motifs of Different Patterns Parallely using DNA Operations

by B.Lavanya, A.Murugan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 33 - Number 1
Year of Publication: 2011
Authors: B.Lavanya, A.Murugan
10.5120/3985-5628

B.Lavanya, A.Murugan . Discovering Sequence Motifs of Different Patterns Parallely using DNA Operations. International Journal of Computer Applications. 33, 1 ( November 2011), 18-24. DOI=10.5120/3985-5628

@article{ 10.5120/3985-5628,
author = { B.Lavanya, A.Murugan },
title = { Discovering Sequence Motifs of Different Patterns Parallely using DNA Operations },
journal = { International Journal of Computer Applications },
issue_date = { November 2011 },
volume = { 33 },
number = { 1 },
month = { November },
year = { 2011 },
issn = { 0975-8887 },
pages = { 18-24 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume33/number1/3985-5628/ },
doi = { 10.5120/3985-5628 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:19:02.390109+05:30
%A B.Lavanya
%A A.Murugan
%T Discovering Sequence Motifs of Different Patterns Parallely using DNA Operations
%J International Journal of Computer Applications
%@ 0975-8887
%V 33
%N 1
%P 18-24
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Discovery of motifs in biological sequences and various types of subsequences in commercial databases have varied applications and interpretations. This paper proposes a new approach to solve the Combinatorial Pattern Matching (CPM), search for continuous and gapped rigid subsequences and discover Longest Common Rigid Subsequences (LCRS) from the given sequences using DNA operations and modified Position Weight Matrix (PWM). The algorithm and its variations have been tested with both real and simulated databases. The proposed work can be applied to genetic, scientific as well as commercial databases. Implementation results shown the correctness of the algorithms. Finally, the validity of the algorithms are checked and their time complexity is analyzed.

References
  1. H.M. Annila, H.Toivonen, and A.I.Verkamo.1997, Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259-289.
  2. J. Ayres, J.Flannick, J.Gehrke, and T.Yiu.,2002, Sequential pattern mining using a bitmap representation. Int. Conf. on Knowledge Discovery and Data Mining, pages 429-435.
  3. Nikhil Bansal, Moshe Lewenstein, Bin MA, and Kaishong Zhang,2010, On the longest common rigid subsequence problem. Algorithmica, 56:270-280.
  4. G. Benson and M.S. Waterman,1994, A method for fast database search for all k-nucleotide repeats. 2nd International conference on Intelligent Systems for Molecular Biology, pages 83-98.
  5. B.Lavanya and A.Murugan,2011, A DNA based approach to find closed repetitive gapped subsequence from a sequence database. International Journal of Computer Applications,29(5),sep, pages 45-49.
  6. Isabelle da Piedade, Man-Hung Eric Tang, and Olivier Elemento,2009, DISPARE: discriminative pattern refinement for position weight matrices. BMC Bioinformatics, 10(388):1471-2105.
  7. D.Lo, S.C.Khoo, and C.Liu, 2007, Efficient mining of iterative patterns for software specification discovery. Int. Conf. on Knowledge Discovery and Data Mining, pages 460-469.
  8. Hirosawa et al.,1995, Comprehensive study on iterative algorithms of multiple sequence alignment. Computational Applications in Biosciences, 11:13-18.
  9. X. Guan and E.C. Uberbacher,1996, A fast look-up algorithm for detecting repetitive DNA sequences. Proceedings of the paci_c symposium on Biocomputing, pages 718-719.
  10. L.Kyle Jensen, P. Mark Styczynski, Isidore Rigoutsos, and N. Gregory Stephanopoulos, 2006, A generic motif discovery algorithm for sequential data. Bioinformatics, 22(1):21-28.
  11. Bin Ma.,2000, A polynomial time approximation scheme for the closest substring problem. LCNS Springer, 1848:99-107.
  12. D. Maier,1978, . The complexity of some problems on subsequences and super sequences. ACM, 25:322-336.
  13. M. Martinez, 1983, An efficient method to find repeats in molecular sequences. Nucleic Acid Research, 11:4629- 4634.
  14. M. Martine, 1988,. A flexible multiple sequence alignment program. Nucleic Acid Research, 16:1683-1691.
  15. M.Li, B.Ma, and L.Wang, 2002, On the closest string and substring problems. J. ACM, 49(2):151-171.
  16. A. Murugan and B.Lavanya,2010, DNA algorithmic approach to solve GCS problem. Journal of Computational Intelligence in Bioinformatics, 3(2):239-247.
  17. A. Murugan, B.Lavanya, and K. Shyamala, 2011, A novel programming approach for DNA computing. International Journal of Computational Intelligence Research, 7(2):199-209.
  18. M.Zhang, B.Kao, D.Cheung, and K.Yip, 2005, Mining periodic patterns with gap requirement from sequences. SIGMOD Int. Conf. on Management of Data, pages 623- 633.
  19. S.B. Needleman and C.D. Wunsc,1970, A general method applicable to the search of similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48:443-453.
  20. A.F. Neuwald and P. Green, 1994, Detecting patterns in protein sequences. Journal of Molecular Biology, 239:698-712.
  21. C.G. Neville-Manning, K.S. Sethi, D. Wu, and D.L. Brutlag, 1977, Enumerating and ranking discrete motifs. Proceedings of Intelligent Systems for Molecular Biology, pages 202-209.
  22. R.Agarwal and R.Srikant.,1995, Mining sequential patterns. Int.Conf. on Data Engineering.
  23. R.Agarwal and R.Srikant, 1976. Mining sequential patterns: Generalizations and performance improvements. Extending DataBase Technology, pages 3-17.
  24. Isisdore Rigoutsos and Aris Floratos.1998, Combinatorial pattern discovery in biological sequences: the teiresias algorithm. Bioinformatics, 14(1):55-67.
  25. Saurabh Sinha,2006, On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics, 22(14):454-463.
  26. H.O. Smith, T.M. Annau, and S. Chandrasegaran, 1990, Finding sequence motifs in groups of functionally related proteins. Proceedings of National Academy (USA), 87:826-830.
  27. R.F. Smith and T.F. Smith, 1990, Automatic generation of primary sequence patterns from sets of related protein sequences. Nucleic Acid Research, 18:118-122.
  28. T.F. Smith and M.S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147:195-197.
  29. R. Staden,1984, Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res, 12:505-519.
  30. G. Stormo, 2000, DNA binding sites: representation and discovery. Bioinformatics, 16:16-23.
  31. M. Suyama, T. Nishioka, and O. Junichi,199,. Searching for common sequence patterns among distantly related proteins. Protein Engineering, 8:1075-1080..
  32. M. Tompa,1999, An exact method for finding short motifs in sequences with application to ribosome binding site problem. Proc. Seventh Int'l Conf Intelligent Systems for Molecular Biology, pages 262-271.
  33. L. Wang and T. Jiang, 1994, On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337-348.
  34. M.S. Waterman, D.J. Galas, and R. Arratia, 1984, Pattern recognition in several sequences: consensus and alignment. Bulletin of Mathematical Biology, 46:515-527.
  35. T.D.Wu and D.L. Brutlag,1995, Identification of protein motifs using conserved amino acid properties and partitioning techniques. Proceedings of the 3rd International conference on Intelligent Systems for Molecular Biology, pages 402-410.
  36. X.Yan, J.Han, and R.Afhar, 2003, Colspan: Mining closed sequential patterns in large datasets. SIAM Int. Conf. Data Mining, pages 166-177.
Index Terms

Computer Science
Information Sciences

Keywords

DNA operations Motifs LCRS CPM PWM Molecular Computing