CFP last date
22 April 2024
Reseach Article

Article:An Efficient Solution for Aligning Huge DNA Sequences

by Ahmad M.Hosny, Howida A Shedeed, Ashraf S. Hussein, Mohamed F. Tolba
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 32 - Number 6
Year of Publication: 2011
Authors: Ahmad M.Hosny, Howida A Shedeed, Ashraf S. Hussein, Mohamed F. Tolba
10.5120/3905-5472

Ahmad M.Hosny, Howida A Shedeed, Ashraf S. Hussein, Mohamed F. Tolba . Article:An Efficient Solution for Aligning Huge DNA Sequences. International Journal of Computer Applications. 32, 6 ( October 2011), 1-8. DOI=10.5120/3905-5472

@article{ 10.5120/3905-5472,
author = { Ahmad M.Hosny, Howida A Shedeed, Ashraf S. Hussein, Mohamed F. Tolba },
title = { Article:An Efficient Solution for Aligning Huge DNA Sequences },
journal = { International Journal of Computer Applications },
issue_date = { October 2011 },
volume = { 32 },
number = { 6 },
month = { October },
year = { 2011 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume32/number6/3905-5472/ },
doi = { 10.5120/3905-5472 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:18:26.703632+05:30
%A Ahmad M.Hosny
%A Howida A Shedeed
%A Ashraf S. Hussein
%A Mohamed F. Tolba
%T Article:An Efficient Solution for Aligning Huge DNA Sequences
%J International Journal of Computer Applications
%@ 0975-8887
%V 32
%N 6
%P 1-8
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Aligning DNA sequences is a fundamental problem in bioinformatics. The exponential growth of protein and DNA databases makes this problem pose a great amount of challenge. Exact methods, which produce optimum sequence alignment according to a scoring function, have quadratic time and space complexity. Therefore, most of the current solutions are based on heuristic methods, which do not guarantee an optimum solution. Recently, many parallel solutions were proposed in order to accelerate the exact methods. However, most of these solutions restrict the sequence’s sizes to be in kilobytes, in such a way that megabyte-scale genome comparison cannot be achieved. In addition, these solutions calculate only the alignment similarity score without finding the actual alignment. This paper presents an efficient solution to find the optimal alignment of the huge DNA sequences. This solution releases the condition of the sequence size to be in megabyte-scale instead of few kilobytes. The fundamental innovation in this work is developing efficient, linear space complexity, parallel solution to achieve the optimum alignment with relatively good performance. The shared memory parallel architecture is the focus of this work and therefore we have considered off-the-shelf systems like multi-core CPUs as well as advanced shared memory platforms. Experimental results show that, the proposed solution achieved high records compared to other solutions that targeted the same goal with less hardware requirements.

References
  1. C.D. Wunsch S.B. Needleman, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," Journal of Molecular Biology, vol. 48, no. 3, pp. 443-453 , March 1970.
  2. M.S. Waterman T.F. Smith, "Identification of common molecular subsequences," Journal of Molecular Biology, vol. 147, no. 1, pp. 195-197, March 1981.
  3. William R. Pearson, "Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms," Genomics, vol. 11, no. 3, pp. 635-650 , November 1991.
  4. Altschul SF, "Gapped BLAST and PSI-BLAST: “A new generation of protein database search programs”," Nucleic Acids Res, vol. 25, no. 17, pp. 3389-3402, 1997.
  5. T. Majumder, A. Kalyanaraman, P. Pande S. Sarkar, "Hardware accelerators for biocomputing: A survey ," Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pp. 3789-3792, 2010.
  6. Sanjay Kumar Pandey ,Digvijay Pandey Binay Kumar Pandey, "A Survey of Bioinformatics Applications on Parallel Architectures," International Journal of Computer Applications, vol. 23, no. 4, pp. 21-25, 2011.
  7. Nuno Filipe Valentim Roma Tiago José Barreiros Martins de Almeida, "A Parallel Programming Framework for Multi-core DNA Sequence Alignment," in International Conference on Complex, Krakow, Poland , 2010, pp. 907-912.
  8. D. D. Shrimankar S. R. Sathe, "Parallelization of DNA Sequence Alignment using OpenMP," in International Conference on Communication, Computing & Security, New York, 2011, pp. 200-203.
  9. Michael Farrar, "Striped Smith–Waterman speeds database searches six times over other SIMD implementations," Bioinformatics, vol. 23, no. 2, pp. 156–161, 2007.
  10. A. Wozniak, "Using video-oriented instructions to speed up sequence comparison," Comput Appl Biosci, vol. 13, no. 2, pp. 145–150, 1997.
  11. Erling Seeberg Torbjørn Rognes, "Six-fold speed-up of Smith–Waterman sequence database searches using parallel processing on common microprocessors," Bioinformatics, vol. 16, no. 8, pp. 699–706, 2000.
  12. Rognes Torbjørn, "Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation," BMC Bioinformatics, vol. 12, no. 1, p. 221, 2011.
  13. B. Schmidt C. Chen, "Computing Large-Scale Alignments on a Multi-Cluster," Fifth IEEE International Conference on Cluster Computing (CLUSTER'03), pp. 38 - 45, 2003.
  14. S. Aluru S. Rajko, "Space and time optimal parallel sequence alignments," vol. 15, no. 12, pp. 1070-1081, 2004.
  15. Alba Cristina Magalhaes Alves de Melo, Mauricio Ayala-Rincon , Thomas M. Santana Azzedine Boukerche, "Parallel Smith-Waterman Algorithm for Local DNA Comparison in a Cluster of Workstations," in Experimental and Efficient Algorithms. Berlin : Springer, 2005, vol. 3503, pp. 131-144.
  16. A. Boukerche, , A. C. Melo R. B. Batista, "A parallel strategy for biological sequence alignment in restricted memory space," Journal of Parallel and Distribituted Computing, vol. 68, no. 4, pp. 548-561, April 2008.
  17. R. B. Batista, A. C. Magalhaes , A. C. Melo A. Boukerche, "Exact pairwise alignment of megabase genome biological sequences using a novel z-align parallel strategy," in Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, Rome, 2009, pp. 1-8.
  18. D.S. Hirschberg, "A linear space algorithm for computing longest common subsequences," Communications of the ACM, vol. 18, no. 6, pp. 341- 343, June 341- 43.
  19. C. Xu, T. Wang, L. Jin and Y. Zhang E. Li, "Parallel Linear Space Algorithm for Large-Scale Sequence Alignment," in Euro-Par 2005 Parallel Processing. Berlin: Springer, 2005, vol. 3648, pp. 644-644.
  20. D. Fan , W. Lin X. Ye, "A fast linear-space sequence alignment algorithm with dynamic parallelization framework," in IEEE Ninth International Conference on Computer and Information, 2009, pp. 274-279.
  21. H. K. Tsoi , W. Luk Y. Yamaguchi, "FPGA-Based Smith-Waterman Algorithm: Analysis and Novel Design," in Reconfigurable Computing: Architectures, Tools and Applications. Berlin : Springer , 2011, vol. 6578, pp. 181-192.
  22. P. Zhang, N. Sun, X. Jiang, X. Liu. L. Xu, "A reconfigurable accelerator for smith-waterman algorithm," Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. 54, no. 12, pp. 1077 - 1081 , December 2007.
  23. "FPGA-Based Smith-Waterman Algorithm: Analysis and Novel Design," in Reconfigurable Computing: Architectures, Tools and Applications. Berlin: Springer, 2011, vol. 6578, pp. 181-192.
  24. Valle Giorgio Manavski Svetlin, "CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment," BMC Bioinformatics, vol. 9, no. 2, pp. 1-9, March 2008.
  25. Schmidt Bertil , Maskell Douglas Liu Yongchao, "CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions," BMC Research Notes, vol. 3, no. 1, pp. 1-12, 2010.
  26. A. C. Melo E. O. Sandes, "CUDAlign: Using GPU to accelerate the comparison of Megabase genomic sequences," in Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming, New York, 2010, pp. 137-146.
  27. Wayne Huang, John Johnson , Sheila Vaidya Yang Liu, "GPU Accelerated Smith-Waterman," in Computational Science – ICCS 2006. Berlin: Springer, 2006, vol. 3994, pp. 188-195.
  28. Xiandong Meng, "A High-Performance Heterogeneous Computing Platform for Biological Sequence Analysis," IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2009.
  29. Talal Bonny, Khaled N. Salama M. Affan Zidan, "High performance technique for database applications using a hybrid GPU/CPU platform," in Proceedings of the 21st edition of the great lakes symposium, New York, 2011, pp. 879 - 899.
  30. O. Gotoh, "An improved algorithm for matching biological sequences," Journal of Molecular Biology, vol. 162, no. 3, pp. 705–708, December 1982.
  31. Sean R. Eddy, Anders Krogh, , Graeme Mitchison Richard Durbin, Biological sequence analysis:Probabilistic Models of Proteins and Nucleic Acids.: Cambridge University Press, 1998.
  32. Dan Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology.: Cambridge University Press, 1997.
  33. Gregory F. Pfister, In search of clusters: the coming battle in lowly parallel computing. Upper Saddle River, NJ, USA: Prentice-Hall, Inc, 1995.
  34. Christian Ledergerber, Philipp Krähenbühl, Christophe Dessimoz Adam Szalkowski, "SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2," BMC Research Notes, vol. 1, no. 1, p. 107, 2008.
  35. A Jankowski, A Modzelewski, A Piotrowski, A Zadrożny W Rudnicki, "The new SIMD Implementation of the Smith-Waterman Algorithm on Cell Microprocessor," Fundamenta Informaticae, vol. 96, no. 1-2, pp. 181–194, 2009.
Index Terms

Computer Science
Information Sciences

Keywords

sequence alignment megabase DNA sequence space-efficient parallel computing multi-core