CFP last date
20 March 2024
Call for Paper
April Edition
IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2024

Submit your paper
Know more
Reseach Article

HCLBLAST for Genome Sequence Matching

by Monika Yadav, Sonal Chaudhary
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 163 - Number 11
Year of Publication: 2017
Authors: Monika Yadav, Sonal Chaudhary
10.5120/ijca2017913777

Monika Yadav, Sonal Chaudhary . HCLBLAST for Genome Sequence Matching. International Journal of Computer Applications. 163, 11 ( Apr 2017), 31-34. DOI=10.5120/ijca2017913777

@article{ 10.5120/ijca2017913777,
author = { Monika Yadav, Sonal Chaudhary },
title = { HCLBLAST for Genome Sequence Matching },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2017 },
volume = { 163 },
number = { 11 },
month = { Apr },
year = { 2017 },
issn = { 0975-8887 },
pages = { 31-34 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume163/number11/27441-2017913777/ },
doi = { 10.5120/ijca2017913777 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:09:57.249296+05:30
%A Monika Yadav
%A Sonal Chaudhary
%T HCLBLAST for Genome Sequence Matching
%J International Journal of Computer Applications
%@ 0975-8887
%V 163
%N 11
%P 31-34
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Genome sequence matching is used to reveal biological information hidden in the DNA sequences and genome sequences. The main objective is to find whether the given sequence is like other sequence or not. To find the similarity between the diseases and intensity of the disease DNA sequences are matched. There is large number of sequences and the database is still growing. Given a genome sequence and to find matching sequences from the complete database is a big challenge. The genome sequence matching algorithms are also computation intensive like BLAST; which performs large number of string matching operations. So to handle this genome sequence matching algorithms and to store data which is Big data; Hadoop is used. Hadoop is a parallel processing Big data framework. The genome sequence database can be stored on Hadoop distributed filesystem. And then can be efficient;y processed using Map/Reduce. The data is distributed in the form of blocks and for every block an instance of mapper is mapped to process the block and then output of all the mappers is combined by reducer. This Map/Reduce process has inter-node parallelism. To further speedup the process and to efficiently utilize the resources like Central processing unit and Graphical processing unit, a parallel processing framework called OpenCL is used. In this work OpenCL is integrated with Hadoop using a API called APARAPI. In addition to inter-node parallelism, intra-node parallelism is also provided and Map/reduce is accelerated for BLAST algorithm which is termed as HCLBLAST. The HCLBLAST is compared with HBLAST and BLAST algorithm for different datasets. It is found that HCLBLAST outperforms in all cases.

References
  1. Matsunaga A, Tsugawa M, Fortes J. CloudBLAST: Combining MapReduce and virtualization on distributed resources for bioinformatics applications. IEEE International Conference on eScience, Indiana, USA, December 2008.
  2. M. Wang, S. B. Handurukande, and M. Nassar, “2012 IEEE 4th International Conference on Cloud Computing Technology and Science RPig : A Scalable Framework for Machine Learning and Advanced Statistical Functionalities,” pp. 3–10, 2012.
  3. L. P. Thompson and D. P. Miranker, “Fast Scalable Selection Algorithms for Large Scale Data,” pp. 412–420, 2013.
  4. [D. Chung, X. Rui, D. Min, and H. Yeo, “Road traffic big data collision analysis processing framework,” 2013 7th Int. Conf. Appl. Inf. Commun. Technol., pp. 1–4, Oct. 2013.
  5. S. H. Park and Y. G. Ha, “Large Imbalance Data Classification Based on MapReduce for Traffic Accident Prediction,” 2014 Eighth Int. Conf. Innov. Mob. Internet Serv. Ubiquitous Comput., pp. 45–49, Jul. 2014.
  6. S. G. Manikandan and S. Ravi, “Big Data Analysis Using Apache Hadoop,” 2014 Int. Conf. IT Converg.Secur., pp. 1–4, Oct. 2014.
  7. S. Maitrey and C. K. Jha, “Handling Big Data Efficiently by Using Map Reduce Technique,” 2015 IEEE Int. Conf. Comput. Intell.Commun. Technol., pp. 703–708, Feb. 2015.
  8. J. Nandimath, “Big Data Analysis Using Apache Hadoop,” pp. 700–703, 2013.
  9. J. Shafer, S. Rixner, and A. L. Cox, “The Hadoop Distributed Filesystem : Balancing Portability and Performance” in IEEE 2010.
  10. J. Conejero, P. Burnap, O. Rana, and J. Morgan, “Scaling Archived Social Media Data Analysis using a Hadoop Cloud,” 2013.
  11. F. Berman, “Got data?: a guide to data preservation in the information age,” Commun. ACM, vol. 51, pp. 50–56, December 2008. [Online]. Available: http://doi.acm.org/10.1145/1409360.1409376
Index Terms

Computer Science
Information Sciences

Keywords

DNA HCLBLAST BLAST NGS