Call for Paper - November 2020 Edition
IJCA solicits original research papers for the November 2020 Edition. Last date of manuscript submission is October 20, 2020. Read More

A Survey of the State-of-the-Art Parallel Multiple Sequence Alignment Algorithms on Multicore Systems

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Authors:
Sara Shehab, Sameh Abdulah, Arabi E. Keshk
10.5120/ijca2018917658

Sara Shehab, Sameh Abdulah and Arabi E Keshk. A Survey of the State-of-the-Art Parallel Multiple Sequence Alignment Algorithms on Multicore Systems. International Journal of Computer Applications 182(12):1-9, August 2018. BibTeX

@article{10.5120/ijca2018917658,
	author = {Sara Shehab and Sameh Abdulah and Arabi E. Keshk},
	title = {A Survey of the State-of-the-Art Parallel Multiple Sequence Alignment Algorithms on Multicore Systems},
	journal = {International Journal of Computer Applications},
	issue_date = {August 2018},
	volume = {182},
	number = {12},
	month = {Aug},
	year = {2018},
	issn = {0975-8887},
	pages = {1-9},
	numpages = {9},
	url = {http://www.ijcaonline.org/archives/volume182/number12/29869-2018917658},
	doi = {10.5120/ijca2018917658},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Evolutionary modeling applications are the best way to provide full information to support in-depth understanding of evaluation of organisms. These applications mainly depend on identifying the evolutionary history of existing organisms and understanding the relations between them, which is possible through the deep analysis of their biological sequences. Multiple Sequence Alignment (MSA) is considered an important tool in such applications, where it gives an accurate representation of the relations between different biological sequences. In literature, many efforts have been put into presenting a new MSA algorithm or even improving existing ones. However, little efforts on optimizing parallel MSA algorithms have been done. Nowadays, large datasets become a reality, and big data become a primary challenge in various fields, which should be also a new milestone for new bioinformatics algorithms. This survey presents four different parallel MSA algorithms, TCoffee, MAFFT, MSAProbs, and M2Align. We provide a detailed discussion of each algorithm including its strengths, weaknesses, and implementation details and the effectiveness of its parallel implementation compared to the other algorithms, taking into account the MSA accuracy on two different datasets, BAliBASE and OXBench.

References

  1. Fabrice Armougom, Sebastien Moretti, Olivier Poirot, Stephane Audic, Pierre Dumas, Basile Schaeli, Vladimir Keduas, and Cedric Notredame. Expresso: automatic incorporation of structural information in multiple sequence alignments using 3d-coffee. Nucleic acids research, 34(suppl_2):W604– W608, 2006.
  2. Geoffrey J Barton and Michael JE Sternberg. A strategy for the rapid multiple alignment of protein sequences: confidence levels from tertiary structure comparisons. Journal of molecular biology, 198(2):327–337, 1987.
  3. Bench. Multiple Sequence Alignment (MSA) benchmark, 1999.
  4. Jia-Ming Chang, Paolo Di Tommaso, Jean-François Taly, and Cedric Notredame. Accurate multiple sequence alignment of transmembrane proteins with psi-coffee. BMC bioinformatics, 13(4):S1, 2012.
  5. Melissa Cline, Richard Hughey, and Kevin Karplus. Predicting reliable regions in protein sequence alignments. Bioinformatics, 18(2):306–314, 2002.
  6. Leonardo Dagum and Ramesh Menon. Openmp: an industry standard api for shared-memory programming. IEEE computational science and engineering, 5(1):46–55, 1998.
  7. Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2):182–197, 2002.
  8. Paolo Di Tommaso, Miquel Orobitg, Fernando Guirado, Fernado Cores, Toni Espinosa, and Cedric Notredame. Cloudcoffee: implementation of a parallel consistency-based multiple alignment algorithm in the t-coffee package and its benchmarking on the amazon elastic-cloud. Bioinformatics, 26(15):1903–1904, 2010.
  9. Sean R. Eddy. Profile hidden markov models. Bioinformatics (Oxford, England), 14(9):755–763, 1998.
  10. Robert C Edgar. Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research, 32(5):1792–1797, 2004.
  11. Robert C Edgar and Kimmen Sjölander. A comparison of scoring functions for protein sequence profile alignment. Bioinformatics, 20(8):1301–1308, 2004.
  12. Da-Fei Feng and Russell F Doolittle. Progressive sequence alignment as a prerequisitetto correct phylogenetic trees. Journal of molecular evolution, 25(4):351–360, 1987.
  13. Robert D Finn, Jody Clements, and Sean R Eddy. Hmmer web server: interactive sequence similarity searching. Nucleic acids research, 39(suppl_2):W29–W37, 2011.
  14. Osamu Gotoh. Optimal alignment between groups of sequences and its application to multiple sequence alignment. Bioinformatics, 9(3):361–370, 1993.
  15. X Huang and W Miller. Lalign-find the best local alignments between two sequences. Adv. Appl. Math, 12:373, 1991.
  16. Kazutaka Katoh, Kazuharu Misawa, Kei-ichi Kuma, and Takashi Miyata. Mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic acids research, 30(14):3059–3066, 2002.
  17. Kazutaka Katoh and Daron M Standley. Mafft multiple sequence alignment software version 7: improvements in performance and usability. Molecular biology and evolution, 30(4):772–780, 2013.
  18. Kazutaka Katoh and Hiroyuki Toh. Parallelization of the mafft multiple sequence alignment program. Bioinformatics, 26(15):1899–1900, 2010.
  19. Carsten Kemena and Cedric Notredame. Upcoming challenges for multiple sequence alignment methods in the highthroughput era. Bioinformatics, 25(19):2455–2465, 2009.
  20. Yongchao liu. MSAProbs - Parallel and accurate multiple sequence alignment. http://msaprobs.sourceforge.net/ homepage.htm, 2018. [Online; accessed 25-May-2018].
  21. Yongchao Liu, Bertil Schmidt, and Douglas L Maskell. Msaprobs: multiple sequence alignment based on pair hidden markov models and partition function posterior probabilities. Bioinformatics, 26(16):1958–1964, 2010.
  22. Dimitrios P Lyras and Dirk Metzler. Reformalign: improved multiple sequence alignments using a profile-based metaalignment approach. BMC bioinformatics, 15(1):265, 2014.
  23. Cédric Notredame. Recent progress in multiple sequence alignment: a survey. Pharmacogenomics, 3(1):131–144, 2002.
  24. Cédric Notredame, Desmond G Higgins, and Jaap Heringa. Tcoffee: a novel method for fast and accurate multiple sequence alignment1. Journal of molecular biology, 302(1):205–217, 2000.
  25. Francisco M Ortuno, Olga Valenzuela, Fernando Rojas, Hector Pomares, Javier P Florido, Jose M Urquiza, and Ignacio Rojas. Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns. Bioinformatics, 29(17):2112–2121, 2013.
  26. GPS Raghava, Stephen MJ Searle, Patrick C Audley, Jonathan D Barber, and Geoffrey J Barton. Oxbench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC bioinformatics, 4(1):47, 2003.
  27. Sara Shehab, Sameh Shohdy, and Arabi E Keshk. Pomsa: An efficient and precise position-based multiple sequence alignment technique. Journal of Advanced Research in Computing and Applications, 9:14–20, 2017.
  28. JD Thomopson, Desmond G Higgins, and Toby J Gibson. Clustalw. Nucleic Acids Res, 22:4673–4680, 1994.
  29. Julie D Thompson, Toby Gibson, Des G Higgins, et al. Multiple sequence alignment using clustalw and clustalx. Current protocols in bioinformatics, pages 2–3, 2002.
  30. Julie D Thompson, Toby J Gibson, and Des G Higgins. Multiple sequence alignment using clustalw and clustalx. Current protocols in bioinformatics, (1):2–3, 2003.
  31. Julie D. Thompson, Frédéric Plewniak, and Olivier Poch. Balibase: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics (Oxford, England), 15(1):87–88, 1999.
  32. Iain M Wallace, Orla O’sullivan, Desmond G Higgins, and Cedric Notredame. M-coffee: combining multiple sequence alignment methods with t-coffee. Nucleic acids research, 34(6):1692–1699, 2006.
  33. Cristian Zambrano-Vega, Antonio J Nebro, José García- Nieto, and Jose F Aldana-Montes. M2align: parallel multiple sequence alignment with a multi-objective metaheuristic. Bioinformatics, 33(19):3011–3017, 2017.
  34. Albert Y Zomaya. Parallel computing for bioinformatics and computational biology: models, enabling technologies, and case studies, volume 55. John Wiley & Sons, 2006.

Keywords

Bioinformatics; Multiple Sequence Alignment; Parallel Processing; Multicore Systems