Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Discovering Sequence Motifs of Different Patterns Parallely using DNA Operations

Print
PDF
International Journal of Computer Applications
© 2011 by IJCA Journal
Volume 33 - Number 1
Year of Publication: 2011
Authors:
B.Lavanya
A.Murugan
10.5120/3985-5628

B.Lavanya and A.Murugan. Article: Discovering Sequence Motifs of Different Patterns Parallely using DNA Operations. International Journal of Computer Applications 33(1):18-24, November 2011. Full text available. BibTeX

@article{key:article,
	author = {B.Lavanya and A.Murugan},
	title = {Article: Discovering Sequence Motifs of Different Patterns Parallely using DNA Operations},
	journal = {International Journal of Computer Applications},
	year = {2011},
	volume = {33},
	number = {1},
	pages = {18-24},
	month = {November},
	note = {Full text available}
}

Abstract

Discovery of motifs in biological sequences and various types of subsequences in commercial databases have varied applications and interpretations. This paper proposes a new approach to solve the Combinatorial Pattern Matching (CPM), search for continuous and gapped rigid subsequences and discover Longest Common Rigid Subsequences (LCRS) from the given sequences using DNA operations and modified Position Weight Matrix (PWM). The algorithm and its variations have been tested with both real and simulated databases. The proposed work can be applied to genetic, scientific as well as commercial databases. Implementation results shown the correctness of the algorithms. Finally, the validity of the algorithms are checked and their time complexity is analyzed.

Reference

  • H.M. Annila, H.Toivonen, and A.I.Verkamo.1997, Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259-289.
  • J. Ayres, J.Flannick, J.Gehrke, and T.Yiu.,2002, Sequential pattern mining using a bitmap representation. Int. Conf. on Knowledge Discovery and Data Mining, pages 429-435.
  • Nikhil Bansal, Moshe Lewenstein, Bin MA, and Kaishong Zhang,2010, On the longest common rigid subsequence problem. Algorithmica, 56:270-280.
  • G. Benson and M.S. Waterman,1994, A method for fast database search for all k-nucleotide repeats. 2nd International conference on Intelligent Systems for Molecular Biology, pages 83-98.
  • B.Lavanya and A.Murugan,2011, A DNA based approach to find closed repetitive gapped subsequence from a sequence database. International Journal of Computer Applications,29(5),sep, pages 45-49.
  • Isabelle da Piedade, Man-Hung Eric Tang, and Olivier Elemento,2009, DISPARE: discriminative pattern refinement for position weight matrices. BMC Bioinformatics, 10(388):1471-2105.
  • D.Lo, S.C.Khoo, and C.Liu, 2007, Efficient mining of iterative patterns for software specification discovery. Int. Conf. on Knowledge Discovery and Data Mining, pages 460-469.
  • Hirosawa et al.,1995, Comprehensive study on iterative algorithms of multiple sequence alignment. Computational Applications in Biosciences, 11:13-18.
  • X. Guan and E.C. Uberbacher,1996, A fast look-up algorithm for detecting repetitive DNA sequences. Proceedings of the paci_c symposium on Biocomputing, pages 718-719.
  • L.Kyle Jensen, P. Mark Styczynski, Isidore Rigoutsos, and N. Gregory Stephanopoulos, 2006, A generic motif discovery algorithm for sequential data. Bioinformatics, 22(1):21-28.
  • Bin Ma.,2000, A polynomial time approximation scheme for the closest substring problem. LCNS Springer, 1848:99-107.
  • D. Maier,1978, . The complexity of some problems on subsequences and super sequences. ACM, 25:322-336.
  • M. Martinez, 1983, An efficient method to find repeats in molecular sequences. Nucleic Acid Research, 11:4629- 4634.
  • M. Martine, 1988,. A flexible multiple sequence alignment program. Nucleic Acid Research, 16:1683-1691.
  • M.Li, B.Ma, and L.Wang, 2002, On the closest string and substring problems. J. ACM, 49(2):151-171.
  • A. Murugan and B.Lavanya,2010, DNA algorithmic approach to solve GCS problem. Journal of Computational Intelligence in Bioinformatics, 3(2):239-247.
  • A. Murugan, B.Lavanya, and K. Shyamala, 2011, A novel programming approach for DNA computing. International Journal of Computational Intelligence Research, 7(2):199-209.
  • M.Zhang, B.Kao, D.Cheung, and K.Yip, 2005, Mining periodic patterns with gap requirement from sequences. SIGMOD Int. Conf. on Management of Data, pages 623- 633.
  • S.B. Needleman and C.D. Wunsc,1970, A general method applicable to the search of similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48:443-453.
  • A.F. Neuwald and P. Green, 1994, Detecting patterns in protein sequences. Journal of Molecular Biology, 239:698-712.
  • C.G. Neville-Manning, K.S. Sethi, D. Wu, and D.L. Brutlag, 1977, Enumerating and ranking discrete motifs. Proceedings of Intelligent Systems for Molecular Biology, pages 202-209.
  • R.Agarwal and R.Srikant.,1995, Mining sequential patterns. Int.Conf. on Data Engineering.
  • R.Agarwal and R.Srikant, 1976. Mining sequential patterns: Generalizations and performance improvements. Extending DataBase Technology, pages 3-17.
  • Isisdore Rigoutsos and Aris Floratos.1998, Combinatorial pattern discovery in biological sequences: the teiresias algorithm. Bioinformatics, 14(1):55-67.
  • Saurabh Sinha,2006, On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics, 22(14):454-463.
  • H.O. Smith, T.M. Annau, and S. Chandrasegaran, 1990, Finding sequence motifs in groups of functionally related proteins. Proceedings of National Academy (USA), 87:826-830.
  • R.F. Smith and T.F. Smith, 1990, Automatic generation of primary sequence patterns from sets of related protein sequences. Nucleic Acid Research, 18:118-122.
  • T.F. Smith and M.S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147:195-197.
  • R. Staden,1984, Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res, 12:505-519.
  • G. Stormo, 2000, DNA binding sites: representation and discovery. Bioinformatics, 16:16-23.
  • M. Suyama, T. Nishioka, and O. Junichi,199,. Searching for common sequence patterns among distantly related proteins. Protein Engineering, 8:1075-1080..
  • M. Tompa,1999, An exact method for finding short motifs in sequences with application to ribosome binding site problem. Proc. Seventh Int'l Conf Intelligent Systems for Molecular Biology, pages 262-271.
  • L. Wang and T. Jiang, 1994, On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337-348.
  • M.S. Waterman, D.J. Galas, and R. Arratia, 1984, Pattern recognition in several sequences: consensus and alignment. Bulletin of Mathematical Biology, 46:515-527.
  • T.D.Wu and D.L. Brutlag,1995, Identification of protein motifs using conserved amino acid properties and partitioning techniques. Proceedings of the 3rd International conference on Intelligent Systems for Molecular Biology, pages 402-410.
  • X.Yan, J.Han, and R.Afhar, 2003, Colspan: Mining closed sequential patterns in large datasets. SIAM Int. Conf. Data Mining, pages 166-177.