HCLBLAST for Genome Sequence Matching

Monika Yadav; Sonal Chaudhary

Call for Paper

June Edition

IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper

Know more

The week's pick

Enhancing Privacy Preservation: Multi-Attribute Protection with P-Sensitive K-Anonymity

Twinkle Patel Kiran Amin

Random Articles

A Novel Hidden Markov Model for Credit Card Fraud Detection

December

2012

An Efficient Approach Based on Trust to Purge the Weakness of Recommendation System

February

2010

Performance Enhancement of Database Driven Technique using Cynosure Method in Cloud

October

2014

Performance Analysis of Controlled Scalability in Unstructured Peer-to-Peer Networks

February

2012

Reseach Article

HCLBLAST for Genome Sequence Matching

by Monika Yadav, Sonal Chaudhary

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 163 - Number 11

Year of Publication: 2017

Authors: Monika Yadav, Sonal Chaudhary

10.5120/ijca2017913777

Monika Yadav, Sonal Chaudhary . HCLBLAST for Genome Sequence Matching. International Journal of Computer Applications. 163, 11 ( Apr 2017), 31-34. DOI=10.5120/ijca2017913777

@article{ 10.5120/ijca2017913777,

author = { Monika Yadav, Sonal Chaudhary },

title = { HCLBLAST for Genome Sequence Matching },

journal = { International Journal of Computer Applications },

issue_date = { Apr 2017 },

volume = { 163 },

number = { 11 },

month = { Apr },

year = { 2017 },

issn = { 0975-8887 },

pages = { 31-34 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume163/number11/27441-2017913777/ },

doi = { 10.5120/ijca2017913777 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:09:57.249296+05:30

%A Monika Yadav

%A Sonal Chaudhary

%T HCLBLAST for Genome Sequence Matching

%J International Journal of Computer Applications

%@ 0975-8887

%V 163

%N 11

%P 31-34

%D 2017

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Genome sequence matching is used to reveal biological information hidden in the DNA sequences and genome sequences. The main objective is to find whether the given sequence is like other sequence or not. To find the similarity between the diseases and intensity of the disease DNA sequences are matched. There is large number of sequences and the database is still growing. Given a genome sequence and to find matching sequences from the complete database is a big challenge. The genome sequence matching algorithms are also computation intensive like BLAST; which performs large number of string matching operations. So to handle this genome sequence matching algorithms and to store data which is Big data; Hadoop is used. Hadoop is a parallel processing Big data framework. The genome sequence database can be stored on Hadoop distributed filesystem. And then can be efficient;y processed using Map/Reduce. The data is distributed in the form of blocks and for every block an instance of mapper is mapped to process the block and then output of all the mappers is combined by reducer. This Map/Reduce process has inter-node parallelism. To further speedup the process and to efficiently utilize the resources like Central processing unit and Graphical processing unit, a parallel processing framework called OpenCL is used. In this work OpenCL is integrated with Hadoop using a API called APARAPI. In addition to inter-node parallelism, intra-node parallelism is also provided and Map/reduce is accelerated for BLAST algorithm which is termed as HCLBLAST. The HCLBLAST is compared with HBLAST and BLAST algorithm for different datasets. It is found that HCLBLAST outperforms in all cases.

References

Matsunaga A, Tsugawa M, Fortes J. CloudBLAST: Combining MapReduce and virtualization on distributed resources for bioinformatics applications. IEEE International Conference on eScience, Indiana, USA, December 2008.
M. Wang, S. B. Handurukande, and M. Nassar, “2012 IEEE 4th International Conference on Cloud Computing Technology and Science RPig : A Scalable Framework for Machine Learning and Advanced Statistical Functionalities,” pp. 3–10, 2012.
L. P. Thompson and D. P. Miranker, “Fast Scalable Selection Algorithms for Large Scale Data,” pp. 412–420, 2013.
[D. Chung, X. Rui, D. Min, and H. Yeo, “Road traffic big data collision analysis processing framework,” 2013 7th Int. Conf. Appl. Inf. Commun. Technol., pp. 1–4, Oct. 2013.
S. H. Park and Y. G. Ha, “Large Imbalance Data Classification Based on MapReduce for Traffic Accident Prediction,” 2014 Eighth Int. Conf. Innov. Mob. Internet Serv. Ubiquitous Comput., pp. 45–49, Jul. 2014.
S. G. Manikandan and S. Ravi, “Big Data Analysis Using Apache Hadoop,” 2014 Int. Conf. IT Converg.Secur., pp. 1–4, Oct. 2014.
S. Maitrey and C. K. Jha, “Handling Big Data Efficiently by Using Map Reduce Technique,” 2015 IEEE Int. Conf. Comput. Intell.Commun. Technol., pp. 703–708, Feb. 2015.
J. Nandimath, “Big Data Analysis Using Apache Hadoop,” pp. 700–703, 2013.
J. Shafer, S. Rixner, and A. L. Cox, “The Hadoop Distributed Filesystem : Balancing Portability and Performance” in IEEE 2010.
J. Conejero, P. Burnap, O. Rana, and J. Morgan, “Scaling Archived Social Media Data Analysis using a Hadoop Cloud,” 2013.
F. Berman, “Got data?: a guide to data preservation in the information age,” Commun. ACM, vol. 51, pp. 50–56, December 2008. [Online]. Available: http://doi.acm.org/10.1145/1409360.1409376

Index Terms

Computer Science

Information Sciences

Keywords

DNA HCLBLAST BLAST NGS