CFP last date
20 May 2024
Reseach Article

Mining Longest Common Subsequence and other Related Patterns using DNA Operations

by A. Murugana, B. Lavanya
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 49 - Number 18
Year of Publication: 2012
Authors: A. Murugana, B. Lavanya
10.5120/7730-1178

A. Murugana, B. Lavanya . Mining Longest Common Subsequence and other Related Patterns using DNA Operations. International Journal of Computer Applications. 49, 18 ( July 2012), 38-44. DOI=10.5120/7730-1178

@article{ 10.5120/7730-1178,
author = { A. Murugana, B. Lavanya },
title = { Mining Longest Common Subsequence and other Related Patterns using DNA Operations },
journal = { International Journal of Computer Applications },
issue_date = { July 2012 },
volume = { 49 },
number = { 18 },
month = { July },
year = { 2012 },
issn = { 0975-8887 },
pages = { 38-44 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume49/number18/7730-1178/ },
doi = { 10.5120/7730-1178 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:46:36.061307+05:30
%A A. Murugana
%A B. Lavanya
%T Mining Longest Common Subsequence and other Related Patterns using DNA Operations
%J International Journal of Computer Applications
%@ 0975-8887
%V 49
%N 18
%P 38-44
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Longest Common Subsequence (LCS) and Shortest Common Subsequence (SCS) problems are to find subsequences in given sequences in which the subsequence is as long as possible and as short as possible subsequence respectively. These subsequences are not necessarily contiguous or unique. In this paper we have proposed two new approaches to find LCS and SCS, of N sequences parallely, using DNA operations. These approaches can be used to find LCS and SCS, of any window size, from any number of sequences, and from any type of input data. The proposed work can be applied to finding diverging patterns, constraint LCS, redescription mining, sequence alignment, speech recognition, find motifs in genetic data bases, pattern recognition, mine emerging patterns, contrast patterns in both scientific and commercial databases. Implementation results shown the correctness of the algorithms. Finally, the validity of the algorithms are checked and their time complexity is analyzed.

References
  1. Murugan. A. and Lavanya. B. DNA algorithmic approach to solve GCS problem. Journal of Computational Intelligence in Bioinformatics,3(2):239-247, 2010.
  2. Murugan. A. , Lavanya. B. , and Shyamala. K. A novel programming approach for DNA computing. International Journal of Computational Intelligence Research, 7(2):199-209, 2011.
  3. Lavanya. B. and Murugan. A. Discovering sequence motifs of different patterns parallelly using DNA operations. International Journal of Computer Applications, 3(1):18-24, Nov 2011.
  4. Lavanya. B. and Murugan. A. A DNA based approach to _nd closed repetitive gapped subsequence from a sequence database. International Journal of Computer Applications, 29(5):45-49, Sep 2011.
  5. Needleman. S. B. and Wunsch. C. D. A general method applicable to the search of similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48:443-453, 1970.
  6. James Bailey, Thomas M Anoukian, and Kotagiri Ramamohanroa. Fast algorithms for mining emerging patterns. LCNS Springer, pages 39-50, 2002.
  7. Nikhil Bansal, Moshe Lewenstein, Bin Ma, and Kaishong Zhang. On the longest common rigid subsequence problem. Algorithmica, 56:270-280, 2010.
  8. Maier. D. The complexity of some problems on subsequences and super sequences. ACM, 25:322-336, 1978.
  9. Wu. T. D. and Brutlag. D. L. Identification of protein motifs using conserved amino acid properties and partitioning techniques. Proceedings of the 3rd International conference on Intelligent Systems for Molecular Biology, pages 402-410, 1995.
  10. Isabelle da Piedade, Man-Hung Eric Tang, and Olivier Elemento. DISPARE: discriminative pattern refinement for position weight matrices. BMC Bioinformatics, 10(388):1471-2105, 2009.
  11. Hirosawa et al. Comprehensive study on iterative algorithms of multiple sequence alignment. Computational Applications in Biosciences, 11:13-18, 1995.
  12. Neuwald. A. F. and Green. P. Detecting patterns in protein sequences. Journal of Molecular Biology, 239:698-712, 1994.
  13. Smith. R. F. and Smith. T. F. Automatic generation of primary sequence patterns from sets of related protein sequences. Nucleic Acid Research, 18:118-122, 1990. .
  14. Smith. T. F. and Waterman. M. S. Identification of common molecular subsequences. Journal of Molecular Biology, 147:195-197, 1981.
  15. Benson. G. and Waterman . M. S. A method for fast database search for all k-nucleotide repeats. 2nd International conference on Intelligent Systems for Molecular Biology, pages 83-98, 1994.
  16. Neville-Manning. C. G. , Sethi. K . S. , Wu. D. ,and Brutlag. A. D. Enumerating and ranking discrete motifs. Proceedings of Intelligent Systems for Molecular Biology, pages 202-209,1997.
  17. Stormo. G. DNA binding sites: representation and discovery. Bioinformatics, 16:16-23, 2000.
  18. Kyle Jensen. L. , Mark Styczynski. P. , Isidore Rigoutsos, and Gregory Stephanopoulos. V. A generic motif discovery algorithm for sequential data. Bioinformatics, 22(1):21-28, 2006.
  19. Wang. L. and Jiang. T. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337-348, 1994.
  20. Nan Li and Tompa. M. Analysis of computational tools for motif discovery. Algorithms of molecular biology, pages 1-8, 2006.
  21. Lo. D. and Khoo. S. D. Liu. Efficient mining of iterative patterns for software specification discovery. Int. Conf. on Knowledge Discovery and Data Mining, pages 460-469, 2007.
  22. Annila. H. M. , Toivonen. H. , and Verkamo. A. I. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259-289, 1997.
  23. Landau. G. M. , Levy. A. , and Newman. I. LCS approximation via embedding into locally nonrepetitive strings. Information and Computation, 209:705-716, 2011.
  24. Li. M. , Ma. B. , and Wang. L. On the closest string and substring problems. J. ACM, 49(2):151-171, 2002.
  25. Martinez. M. An efficient method to find repeats in molecular sequences. Nucleic Acid Research, 11:4629-4634, 1983.
  26. Martinez. M. A flexible multiple sequence alignment program. Nucleic Acid Research, 16:1683-1691, 1988.
  27. Suyama. M. , Nishioka. T. , and Junichi. O. Searching for common sequence patterns among distantly related proteins. Protein Engineering, 8:1075-1080, 1995.
  28. Zhang. M. , Kao. B. , Cheung. B. , and Yip. K. Mining periodic patterns with gap requirement from sequences. SIGMOD Int. Conf. on Management of Data, pages 623-633, 2005.
  29. Bin Ma. A polynomial time approximation scheme for the closest substring problem. LCNS Springer, 1848:99-107, 2000.
  30. Smith. H. O. , Annau. T. M. , and Chandrasegaran. S. Finding sequence motifs in groups of functionally related proteins. Proceedings of National Academy (USA), 87:826-830, 1990.
  31. Agarwal. R. and Srikant. R. Mining sequential patterns. Int. Conf. on Data Engineering, 1995.
  32. Agarwal. R. and Srikant. R. Mining sequential patterns: Generalizations and performance improvements. Extending DataBase Technology, pages 3-17, 1996.
  33. Staden. R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res, 12:505-519, 1984.
  34. Isisdore Rigoutsos and Aris Floratos. Combinatorial pattern discovery in biological sequences: the teiresias algorithm. Bioinformatics, 14(1):55-67, 1998.
  35. Waterman. M. S. , Galas. D. J. , and Arratia. R. Pattern recognition in several sequences: consensus and alignment. Bulletin of Mathematical Biology, 46:515-527, 1984.
  36. Saurabh Sinha. On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics, 22(14):454-463, 2006.
  37. Yin-Te Tsai. The constrained longest common subsequence problem. Information processing letter, 88:173-176, 2003.
  38. Qian Wan and Aijun An. Diverging patterns: Discovering signi_cant dissimilarities in large databases. Technical Report CSE-2008-10, Dec 2008.
  39. Guan. X. and Uberbacher. E. C. A fast lookup algorithm for detecting repetitive DNA sequences. Proceedings of the pacific symposium on Biocomputing, pages 718-719, 1996.
  40. Yan. X. , Han. J. , and Afhar. R. Colspan: Mining closed sequential patterns in large datasets. SIAM Int. Conf. Data Mining, pages 166-177, 2003.
Index Terms

Computer Science
Information Sciences

Keywords

DNA operations Motifs LCS SCS CLCS Pattern recognition Diverging pattern Exceptional mining