CFP last date
20 May 2024
Reseach Article

An Efficient Bulk Synchronous Parallelized Scheduler for Bioinformatics Application on Public Cloud

by Siddu P. Algur, Leena I. Sakri
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 145 - Number 15
Year of Publication: 2016
Authors: Siddu P. Algur, Leena I. Sakri
10.5120/ijca2016910882

Siddu P. Algur, Leena I. Sakri . An Efficient Bulk Synchronous Parallelized Scheduler for Bioinformatics Application on Public Cloud. International Journal of Computer Applications. 145, 15 ( Jul 2016), 22-30. DOI=10.5120/ijca2016910882

@article{ 10.5120/ijca2016910882,
author = { Siddu P. Algur, Leena I. Sakri },
title = { An Efficient Bulk Synchronous Parallelized Scheduler for Bioinformatics Application on Public Cloud },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2016 },
volume = { 145 },
number = { 15 },
month = { Jul },
year = { 2016 },
issn = { 0975-8887 },
pages = { 22-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume145/number15/25356-2016910882/ },
doi = { 10.5120/ijca2016910882 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:48:57.509399+05:30
%A Siddu P. Algur
%A Leena I. Sakri
%T An Efficient Bulk Synchronous Parallelized Scheduler for Bioinformatics Application on Public Cloud
%J International Journal of Computer Applications
%@ 0975-8887
%V 145
%N 15
%P 22-30
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Genomic sequence alignment of varied species is one of the most sort of applications in bioinformatics. In future bioinformatics technologies are expected to produce genomic data of terabyte. Bioinformatics computation require super computer for sequence alignment computation which involves huge cost. Parallelization technique is a way forward in computing sequence alignment with limited cost and time. Cloud computing and MapReduce framework play an important role in bioinformatics intensive application to achieve parallelization since it provides a consistent performance over time and it also provides good fault tolerant mechanism. The existing gene sequencing methodologies are designed based on Hadoop-MapReduce framework which adopts a serial execution strategy which is an area of concern. This work introduces a Smith-Waterman Alignment on the Bulk synchronous Parallel Map Reduce (SW-BSPMR) cloud platform for bioinformatics gene sequence alignment. This work adopts a widely accepted and accurate SW algorithm for sequence alignment and parallel synchronous scheduler methodology of map and reduce framework process is considered. A customized MapReduce based on Microsoft Azure cloud platform is developed to overcome the issue in Hadoop-MapReduce framework. The experimental study presented in this work proves that the SW-BSPMR can accurately and effectively align bioinformatics genomic sequences of various read length.

References
  1. Taylor N. Job and Jin H. Park “Exploiting High Performance on Bioinformatics Applications in a Cloud System”, vol. 22, no. 2, pp.22-24, 2014
  2. T.F. Smith and M.S. Waterman, “Identification of Common Molecular Subsequences,” J. Molecular Biology, vol. 147, no. 1, pp. 195-197, Mar. 1981
  3. O. Gotoh, “An Improved Algorithm for Matching Biological Sequences,” J. Molecular Biology, vol. 162, no. 3, pp. 705-708, Dec. 1982.
  4. W.R. Pearson and D.J. Lipman "Improved Tools for Biological Sequence Comparison" US National Academy of Sciences, vol. 85, pp. 2444-2448, 1988.
  5. S. Altschul, T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman, "Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, pp. 3389-3402, 1997.
  6. W. James Kent "BLAT-The BLAST-Like Alignment Tool", Genome Res., vol. 12, no. 4, pp.656 -664 2002
  7. Li R, Li Y, Kristiansen K, Wang J. SOAP: shortoligo nucleotide alignment program. BMC Bioinformatics 24(5):713714, 2008.
  8. T. Nguyen, et al., "CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping," BMC Res Notes, vol. 4, p. 171, 2011.
  9. Bakery M, Buyyaz R. Cluster computing at a glance. In: Buyyaz R, ed. High Performance Cluster Computing: Architectures and System. Upper Saddle River, NJ: Prentice-Hall; 1999:3–47.
  10. Schatz MC, Langmead B, Salzberg SL: Cloud computing and the DNA data race. Nat Biotechnol 2010, 28(7):691–693.
  11. J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” in OSDI, 2004, pp. 137–150.
  12. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica “Spark: Cluster Computing with Working Sets,” in Proceedings of the 2nd USENIX Conference on Hot topics in Cloud Computing, (Boston, MA), June 2010.
  13. D. Singh and C. K. Reddy, “A survey on platforms for big data analytics,” Journal of Big Data, vol. 2, article 8, 2014.
  14. Jianhua Zhang; Wenbo Zhang; Heng Wu; Tao Huang, "VMFDF: A Virtualization-based Multi-Level Fault Detection Framework for High Availability Computing," e-Business Engineering (ICEBE), 2012 IEEE Ninth International Conference on , vol., no., pp.367,373, 9-11 Sept. 2012
  15. Chuliang Weng; Jianfeng Zhan; Yuan Luo, "TSAC: Enforcing Isolation of Virtual Machines in Clouds," Computers, IEEE Transactions on, vol.64, no.5, pp.1470, 1482, May 1 2015
  16. J. E. Smith and R. Nair, Virtual Machines: Versatile Platforms for Systems and Processes. New York, NY, USA: Elsevier, 2005
  17. Hadoop, http://hadoop.apache.org
  18. T. Nguyen, et al., "CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping," BMC Res Notes, vol. 4, p. 171, 2011.
  19. Schatz M: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 2009, 25(11):1363.
  20. G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: a system for largescale graph processing,” in Proceedings of the 2010 international conference on Management of data, ser. SIGMOD ’10. New York, NY, USA: ACM, 2010, pp. 135–146.
  21. J.Ekanayake, H.Li, B.Zhang et al., "Twister: A Runtime for iterative MapReduce," in Proceedings of the First International Workshop on MapReduce and its Applications of ACM HPDC 2010 conference June 20-25, 2010, Chicago, Illinois, 2010
  22. Jiang, D.; Tung, A.; Gang Chen, "MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters," Knowledge and Data Engineering, IEEE Transactions on , vol.23, no.9, pp.1299,1311, Sept. 2011
  23. Dahiphale, D.; Karve, R.; Vasilakos, A.V.; Huan Liu; Zhiwei Yu; Chhajer, A.; Jianmin Wang; Chaokun Wang, "An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications," Network and Service Management, IEEE Transactions on , vol.11, no.1, pp.101,115, March 2014
  24. Feng X, Grossman R, and Stein L: PeakRanger: a cloud-enabled peak caller for ChIP-seq data. BMC Bioinformatics 2011, 12:139.
  25. Saccharomyces genome database (SGD). (2015). [Online] Available: http://www.yeastgenome.org/
  26. Marinescu, D.C., "Parallel and Distributed Computing: Memories of Time Past and a Glimpse at the Future," Parallel and Distributed Computing (ISPDC), 2014 IEEE 13th International Symposium on , vol., no., pp.14,15, 24-27 June 2014
  27. Gartner, Inc. Gartner says worldwide cloud services market to surpass $68 billion in 2010. http://www.gartner.com/it/page.jsp?id=1389313,
  28. P. Mell and T. Grance, The NIST Definition of Cloud Computing, US National Institute of Science and Techonology Std., 2011.[Online]. Available: http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
  29. Bera, S. Misra, S, Rodrigues J.J.P.C, "Cloud Computing Applications for Smart Grid: A Survey," Parallel and Distributed Systems, IEEE Transactions on, vol.PP, no.99, pp.1, 1.PrePrints.doi: 10.1109/TPDS.2014.2321378
  30. P. Mehrotra, J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff,S. Saini, and R. Biswas, “Performance Evaluation of Amazon EC2 for NASA HPC applications,” in Proceedings of the 3rd workshop on Scientific Cloud Computing. New York, NY, USA: ACM, 2012
  31. Chun Hui Suen, "Evaluating and Improving the Performance and Scheduling of HPC Applications in Cloud", IEEE Transactions on Cloud Computing, , no. 1, pp. 1, PrePrints , doi:10.1109/TCC.2014.2339858
  32. Dahiphale, D.; Karve, R.; Vasilakos, A.V.; Huan Liu; Zhiwei Yu; Chhajer, A.; Jianmin Wang; Chaokun Wang, "An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications," Network and Service Management, IEEE Transactions on , vol.11, no.1, pp.101,115, March 2014
  33. G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: a system for largescale graph processing,” in Proceedings of the 2010 international conference on Management of data, ser. SIGMOD ’10. New York, NY, USA: ACM, 2010, pp. 135–146.
  34. Kajdanowicz, T.; Indyk, W.; Kazienko, P.; Kukul, J., "Comparison of the Efficiency of MapReduce and Bulk Synchronous Parallel Approaches to Large Network Processing," Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on , vol., no., pp.218,225, 10-10 Dec. 2012
  35. Hyungro Lee: Using Bioinformatics Applications on the Cloud.
  36. Michael C. Schatz: CloudBurst: highly sensitive read mapping with MapReduce.
  37. Tung Nguyen, Weisong Shi and Douglas Ruden: CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.
  38. LI Xubin, JIANG Wenrui, JIANG Yi and ZOU Quan: Hadoop Applications in Bioinformatics.
  39. Xiao-liang Yang, Yu-long Liu, Chun-feng Yuan, Yi-hua Huang: Parallelization of BLAST with MapReduce for Long Sequence Alignment
  40. Hdinsight (hadoop on Azure)," https://www.hadooponAzure.com/.
  41. Baheti, V.K., "Windows Azure HDInsight: Where big data meets the cloud," IT in Business, Industry and Government (CSIBIG), 2014 Conference on, vol., no., pp.1,2, 8-9 March 2014
  42. Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (Mar. 2003), 1289-1305.
  43. Brown, L. D., Hua, H., and Gao, C. 2003. A widget framework for augmented interaction in SCAPE.
  44. Y.T. Yu, M.F. Lau, "A comparison of MC/DC, MUMCUT and several other coverage criteria for logical decisions", Journal of Systems and Software, 2005, in press.
  45. Spector, A. Z. 1989. Achieving application requirements. In Distributed Systems, S. Mullender
Index Terms

Computer Science
Information Sciences

Keywords

Bioinformatics genomic sequence Scheduler parallelization hadoop Microsoft Azure