CFP last date
22 April 2024
Reseach Article

Frequent Contiguous Pattern Mining Algorithms for Biological Data Sequences

by S. Rajasekaran, L. Arockiam
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 95 - Number 14
Year of Publication: 2014
Authors: S. Rajasekaran, L. Arockiam
10.5120/16661-6646

S. Rajasekaran, L. Arockiam . Frequent Contiguous Pattern Mining Algorithms for Biological Data Sequences. International Journal of Computer Applications. 95, 14 ( June 2014), 15-20. DOI=10.5120/16661-6646

@article{ 10.5120/16661-6646,
author = { S. Rajasekaran, L. Arockiam },
title = { Frequent Contiguous Pattern Mining Algorithms for Biological Data Sequences },
journal = { International Journal of Computer Applications },
issue_date = { June 2014 },
volume = { 95 },
number = { 14 },
month = { June },
year = { 2014 },
issn = { 0975-8887 },
pages = { 15-20 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume95/number14/16661-6646/ },
doi = { 10.5120/16661-6646 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:19:26.700269+05:30
%A S. Rajasekaran
%A L. Arockiam
%T Frequent Contiguous Pattern Mining Algorithms for Biological Data Sequences
%J International Journal of Computer Applications
%@ 0975-8887
%V 95
%N 14
%P 15-20
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Transaction sequences in market-basket analysis have large set of alphabets with small length, whereas bio-sequences have small set of alphabets of long length with gap. There is the difference in pattern finding algorithms of these two sequences. The chances of repeatedly occurring small patterns are high in bio-sequences than in the transaction sequences. These repeatedly occurring small patterns are called as Frequent Contiguous Patterns (FCP). The challenging task in pattern finding of bio-sequences is to find FCP. FCP gives clues for genetic discovery, functional analysis and also helps to assemble a whole genome of species. Most of the existing FCP algorithms are all based on Apriori method. They require repeated scanning of the database and large number of intermediate tables to produce the results. So, these algorithms require large space and high computational time. In this paper, we are analyzing few of the currently available FCP algorithms with their advantages and disadvantages.

References
  1. Han J. , Kamber M. 2006. Data Mining: Concepts and Techniques. Elsevier, 2nd Edition, pp 230, 2006.
  2. Agrawal R. , Imielinski T and Swami A. 1993. Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD conference on Management of Data, pp 207-216, Washington DC, May 1993.
  3. Agrawal R, Srikant R. 1994. Fast Algorithms for Mining Association Rules. Proceedings of the 20th VLDB conference, Santiago.
  4. Srikant R. , Agrawal R. 1996. Mining sequential patterns: Generalizations and performance improvements. 5th International Conference on Extending Database Technology, Avignon, France.
  5. Zaki M. J. 2001. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, Special Issue on Unsupervised Learning, Vol. 42, No. ½, pp 31-60, 2001.
  6. Pei J. , Han J. , Asl B. , Chen Q. , Dayal U. and Hsu M. 2001. Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth, ICDE, 2001.
  7. Hirschberg DS. 1977. Algorithms for the longest common subsequences problem. Journal of the Association for Computing Machinery. Vol 24. No 4. October 1977. pp 664-675.
  8. Huo H, Stojkovic V. 2007. A suffix tree construction algorithm for DNA sequences. Proceeding of IEEE International conference on Bioinformatics and Bioengineering (BIBE'07), 2007, Oct 14-17, Boston, MA, pp 1178-1182.
  9. Wang K. , Xu Y. and Yu J. X. 2004. Scalable Sequential Pattern Mining for Biological Sequences. CIKM'04. Proceedings of the thirteenth ACM international conference on Information and knowledge management. Pages 178-187.
  10. Yang J. , Wang W. , Yu P. S. and Han J. 2002. Mining long sequential patterns in a noisy environment. SIGMOD, 2002.
  11. Brazma A. , Jonassen I. , Eidhammer I. and Gilbert D. 1995. Approaches to the automatic discovery of patterns in biosequences. Technical report, Departmetn of Informatics, University of Bergen, Norway, 1995.
  12. Rashid Md. M. , Karim Md. Rezaul, Jeong B. and Choi H. 2012. Efficient Mining of Interesting Patterns in Large Biological Sequences. Genomics & Informatics. Vol 10 (1) 44-50.
  13. Blahut R. 1987. Principles and Practice of Information Theory. Addison-Wesley Longman Publishing Co. , Inc. Boston, MA, 1987.
  14. Kang T. H. , Yoo J. S. and Kim H. Y. 2008. Mining frequent contiguous sequence patterns in biological sequences. Proceedings of 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE'08), Athens, Oct 8-10, 2008, pp 723-728
  15. Zerin SF, Ahmed CF, Tanbeer SK, Jeong BS. 2010. A fast indexed based contiguous sequential pattern mining technique in biological data sequences. In Proceedings of 2nd International Conference on Emerging Databases (EBD'10), Jeju
  16. Karim Md. R. , Rashid Md. M. , Jeong B. S. and Choi H. J. 2012. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases. Genomics & Informatics. Vol 10(1) 51-57, March 2012.
  17. Tanvee M. M. , Kabeer S. J. , Chowdhury T. M. , Sarja A. A. and Shuvo Md. T. H. 2013. Mining Maximal Adjacent Frequent Patterns from DNA Sequences using Location Information. International Journal of Computer Applications. Vol. 76 – No. 15.
  18. Zerin S. F. and Jeong B. S. 2011. A Fast Contiguous Sequential Pattern Mining Technique in DNA Sequences Using Position Information. IETE Technical Review. Vol. 28 – Issue 6.
Index Terms

Computer Science
Information Sciences

Keywords

Frequent Contiguous Pattern Apriori Scalable Pattern Mining Surprising Bio-Patterns Spanning Tree