Call for Paper - November 2022 Edition
IJCA solicits original research papers for the November 2022 Edition. Last date of manuscript submission is October 20, 2022. Read More

Parallel DNA Sequence Approximate Matching with Multi-Length Sequence Aware Approach

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Authors:
Hadeel Alazzam, Ahmad Sharieh
10.5120/ijca2018916594

Hadeel Alazzam and Ahmad Sharieh. Parallel DNA Sequence Approximate Matching with Multi-Length Sequence Aware Approach. International Journal of Computer Applications 180(26):1-6, March 2018. BibTeX

@article{10.5120/ijca2018916594,
	author = {Hadeel Alazzam and Ahmad Sharieh},
	title = {Parallel DNA Sequence Approximate Matching with Multi-Length Sequence Aware Approach},
	journal = {International Journal of Computer Applications},
	issue_date = {March 2018},
	volume = {180},
	number = {26},
	month = {Mar},
	year = {2018},
	issn = {0975-8887},
	pages = {1-6},
	numpages = {6},
	url = {http://www.ijcaonline.org/archives/volume180/number26/29117-2018916594},
	doi = {10.5120/ijca2018916594},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

DNA sequence approximate matching is one of the main challenges in Bioinformatics. Despite the evolution of new technology, there is still a need for new algorithms that accommodate the huge amount of Bioinformatics data. In this paper, a parallel n-gram approach is proposed with a method that is taking in mind the variety of DNA sequence lengths for approximate matching. The proposed approach showed a satisfiability result in terms of time complexity compared to parallel dynamic programming method.

References

  1. M.I. Khalil. Locating all common subsequences in two dna sequences. Information Technology and Computer Science, 5:81–87, 2016.
  2. Diao Y. Gyllstrom-D. Agrawal, J. and N. Immerman. Efficient pattern matching over event streams. . In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 147–160, June 2008.
  3. R. Bhukya and D. V. L. N. Somayajulu. Exact multiple pattern matching algorithm using dna sequence and pattern pair. International Journal of Computer Applications, 17(8):32–38, 2011.
  4. N. Singla and D. Garg. String matching algorithms and their applicability in various applications. International journal of soft computing and engineering, 1(6):218–222, 2012.
  5. J. Kawulok. Approximate string matching for searching dna sequences. International Journal of Bioscience, Biochemistry and Bioinformatics, 3(2):145, 2013.
  6. A. A. Almazroi. A fast hybrid algorithm approach for the exact string matching problem via berry ravindran and alpha skip search algorithms. Journal of Computer Science, 7(5):466, 2011.
  7. M. O. Kulekci. Filter based fast matching of long patterns by using simd instructions. In Stringology, pages 118–128, August 2009.
  8. Mustafa I. S. Sharieh, A. A. A. and N Obeid. Row column diagonal using multithreads for sequence alignment in dna. European Journal of Scientific Research, 30(1):6–25, 2009.
  9. Holub J. Peltola H. Durian, B. and J. Tarhio. Improving practical exact string matching. Information Processing Letters, 110(4):148–152, 2010.
  10. Naser M. A. S. Al-Dabbagh, S. S. M. and N. H. Barnouti. Fast hybrid string matching algorithm based on the quick-skip and tuned boyer-moore algorithms. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 8(6):117–127, 2017.
  11. Whang K. Kim, M. and J Lee. n-gram/2l-approximation: a two-level n-gram inverted index structure for approximate string matching. Computer Systems Science and Engineering, 22(6):365, 2007.
  12. Whang K. Y. Lee J. G. Kim, M. S. and M. J Lee. n-gram/2l: A space and time efficient two-level n-gram inverted index structure. In Proceedings of the 31st international conference on Very large data bases, pages 325–336, August 2005.
  13. Yao N. Fan, H. and H. Ma. Fast variants of the backward-oracle-marching algorithm. Fourth International Conference on, 34:56–59, December 2009.
  14. K. Fredriksson and S. Grabowski. Practical and optimal string matching. In SPIRE, 3772:376–387, November 2005.
  15. H. Peltola and J. Tarhio. Alternative algorithms for bit-parallel string matching. In SPIRE, 2857:80–94, January 2003.
  16. R. S. Boyer and J. S Moore. A fast string searching algorithm. Communications of the ACM, 20(10):762–772, 1977.
  17. Gelbukh A. Gmez-Adorno H. Sidorov, G. and D. Pinto. Soft similarity and soft cosine measure: Similarity of features in vector space model. Computacin y Sistemas, 18(3):491–504, 2014.
  18. Sardaraz M. Tahir, M. and A. A. Ikram. Epma: Efficient pattern matching algorithm for dna sequences. Expert Systems with Applications, 80:162–170, 2017.
  19. M. V. Ramakrishnan and M. S. Eswaran. Acomparative study of various parallel longest common subsequence (lcs) algorithms. International Journal of Computer Trends and Technology, 4(2), 2013.
  20. R. C NCBI. Database resources of the national center for biotechnology information. FNucleic acids research, 45(D1):56–59, 2017.

Keywords

DNA Sequence, Longest Common Sequence, N-gram, Parallel