CFP last date
20 May 2024
Reseach Article

A Hybrid OpenMP-MPI Parallelization of Structure Software

by Rafal Dobosz, Richard Hurley, Sabine Mcconnell
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 118 - Number 11
Year of Publication: 2015
Authors: Rafal Dobosz, Richard Hurley, Sabine Mcconnell
10.5120/20786-3434

Rafal Dobosz, Richard Hurley, Sabine Mcconnell . A Hybrid OpenMP-MPI Parallelization of Structure Software. International Journal of Computer Applications. 118, 11 ( May 2015), 1-9. DOI=10.5120/20786-3434

@article{ 10.5120/20786-3434,
author = { Rafal Dobosz, Richard Hurley, Sabine Mcconnell },
title = { A Hybrid OpenMP-MPI Parallelization of Structure Software },
journal = { International Journal of Computer Applications },
issue_date = { May 2015 },
volume = { 118 },
number = { 11 },
month = { May },
year = { 2015 },
issn = { 0975-8887 },
pages = { 1-9 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume118/number11/20786-3434/ },
doi = { 10.5120/20786-3434 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:01:22.676998+05:30
%A Rafal Dobosz
%A Richard Hurley
%A Sabine Mcconnell
%T A Hybrid OpenMP-MPI Parallelization of Structure Software
%J International Journal of Computer Applications
%@ 0975-8887
%V 118
%N 11
%P 1-9
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Big Data has an increasing impact on the use of bioinformatics software. One way to deal with this challenge is through parallel computing. Using the program Structure as a case study, this paper investigates ways in which to counteract the challenges created by the growing datasets. This paper proposes an OpenMP-MPI hybrid parallelization of the MCMC steps, which are an integral part of Structure, and analyses the performance under various scenarios. The results indicate that the parallelization produce significant speedups over the serial version in all scenarios tested. This allows for the use of the hardware in a more efficient manner, by adapting the program to the parallel architecture. This is important because not only does it reduce the time required to perform existing analyses, but also opens the door to the analysis of previously impractically large datasets.

References
  1. Jonathan K. Pritchard, Matthew Stephens, and Peter Donnelly. Inference of Population Structure Using Multilocus Genotype Data. Genetics, 155(2):945–959, June 2000.
  2. Francis S. Collins, Michael Morgan, and Aristides Patrinos. The Human Genome Project: Lessons from Large-Scale Biology. Science, 300(5617):286–290, April 2003.
  3. Kristin L. Patrick. 454 life sciences: illuminating the future of genome sequencing and personalized medicine. Yale J Biol Med, 80(4):191–194, Dec 2007.
  4. Vicki Pandey, Robert C. Nutter, and Ellen Prediger. Applied Biosystems SOLiD System: Ligation-Based Sequencing, pages 29–42. Wiley-VCH Verlag GmbH and Co. KGaA, 2008. ISBN 9783527625130.
  5. Elaine R. Mardis. Next-generation sequencing platforms. Annu Rev Anal Chem (Palo Alto Calif), 6(1):287–303, Jun 2013.
  6. Surendra Kumar, Asmund Skjaeveland, Russell Orr, Pal Enger, Torgeir Ruden, Bjorn H. Mevik, Fabien Burki, Andreas Botnen, and Kamran S. Tabrizi. AIR: A batch-oriented web program package for construction of supermatrices ready for phylogenomic analyses. BMC Bioinformatics, 10(1):357+, October 2009.
  7. Francois Besnier and Kevin A. Glover. ParallelStructure: a R package to distribute parallel runs of the population genetics program STRUCTURE on multi-core computers. PLoS One, 8(7):70651, July 2013.
  8. Darren Wilkinson. Parallel Bayesian computation. In E. J. Kontoghiorghes, editor, Handbook of Parallel Computing and Statistics, Statistics: Textbooks and Monographs. Marcel Dekker, New York, 2004.
  9. Daniel Falush, MatthewStephens, and Jonathan K. Pritchard. Inference of Population Structure Using Multilocus Genotype Data: Linked Loci and Correlated Allele Frequencies. Genetics, 164(4):1567–1587, August 2003.
  10. Daniel Falush, MatthewStephens, and Jonathan K. Pritchard. Inference of population structure using multilocus genotype data: dominant markers and null alleles. Molecular Ecology Notes, 7(4):574–578,
  11. Melissa J. Hubisz, Daniel Falush, Matthew Stephens, and Jonathan K. Pritchard. Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources, 9(5):1322–1332, September 2009.
  12. Anthony E. Brockwell. Parallel Markov Chain Monte Carlo Simulation by Pre-Fetching. Journal of Computational and Graphical Statistics, pages 246–261, March 2006.
  13. LeonardoDagumand Ramesh Menon. OpenMP: an industrystandard API for shared-memory programming. IEEE Computational Science and Engineering, 5(1):46–55, January– March 1998.
  14. Monya Baker. Next-generation sequencing: adjusting to data overload. Nature Methods, 7(7):495 - 499, July 2010.
  15. Message Passing Interface Forum. MPI: A message-passing interface standard. International Journal of Supercomputer Applications, 8(3/4), May 1994.
  16. Gene M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, spring joint computer conference, AFIPS '67 (Spring), pages 483–485, New York, NY, USA, 1967. ACM.
  17. Michael Mascagni and Ashok Srinivasan. Algorithm 806: SPRNG: a scalable library for pseudorandom number generation. ACM Trans. Math. Softw. , 26(3):436–461, September 2000.
  18. Alan H. Karp and Horace P. Flatt. Measuring parallel processor performance. Commun. ACM, 33(5):539–543, May 1990.
  19. Cornelya F. C. Klütsch, Rodney J. Dyer, and Bernhard Misof. Combining multiple analytical approaches for the identification of population structure and genetic delineation of two subspecies of the endemic Arabian burnet moth Reissita simonyi (Zygaenidae; Lepidoptera). Conservation Genetics, 13(1):21–37, February 2012.
Index Terms

Computer Science
Information Sciences

Keywords

MPI OpenMP parallelization Structure MCMC SPRNG SHARCNET speedup Big Data High Performance Computing