CFP last date
20 May 2024
Reseach Article

BLOSUM Trie for Faster Hit Detection in FSA Protein BLAST

by M Anuradha, K Suman Nelson, P V G D Prasad Reddy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 64 - Number 1
Year of Publication: 2013
Authors: M Anuradha, K Suman Nelson, P V G D Prasad Reddy
10.5120/10602-5306

M Anuradha, K Suman Nelson, P V G D Prasad Reddy . BLOSUM Trie for Faster Hit Detection in FSA Protein BLAST. International Journal of Computer Applications. 64, 1 ( February 2013), 46-53. DOI=10.5120/10602-5306

@article{ 10.5120/10602-5306,
author = { M Anuradha, K Suman Nelson, P V G D Prasad Reddy },
title = { BLOSUM Trie for Faster Hit Detection in FSA Protein BLAST },
journal = { International Journal of Computer Applications },
issue_date = { February 2013 },
volume = { 64 },
number = { 1 },
month = { February },
year = { 2013 },
issn = { 0975-8887 },
pages = { 46-53 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume64/number1/10602-5306/ },
doi = { 10.5120/10602-5306 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:15:18.557689+05:30
%A M Anuradha
%A K Suman Nelson
%A P V G D Prasad Reddy
%T BLOSUM Trie for Faster Hit Detection in FSA Protein BLAST
%J International Journal of Computer Applications
%@ 0975-8887
%V 64
%N 1
%P 46-53
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Basic Local Alignment Search Tool (BLAST) is one of the most widely used bioinformatics tools to determine similarities between genomic sequences. Ever since its inception several algorithmic improvements have been made to improve speed and runtime memory requirements without affecting the sensitivity and selectivity of the tool. Fast search algorithm (FSA) BLAST has been the most successful among such improvements with 20-30% faster processing rate. A DFA with hashed prefix word structures for the hit detection process in FSA BLAST has been proposed in the earlier work. Coding of the algorithms and testing on protein samples showed that the use of the new structure resulted in significant reduction in run time space but not the hit detection time. This paper proposes the use of a BLOSUM trie structure which eliminates the process of computing neighborhood words, resulted in a reduction of up to 75% in hit detection time. Tests were conducted with different BLOSUM matrices and threshold values and the proposed algorithm was found to be beneficial in terms of space, time complexity and accuracy without compromising on sensitivity and selectivity of the currently being used algorithm.

References
  1. Pertsemlidis, A. and John, W. Fondon III. 2001. Tutorial - Having BLAST with bioinformatics (and avoiding BLASTphemy), Genome Biology 2(10).
  2. Altschul, S. F. Gish, W. Miller, W. Myers, E. W. and D. J. Lipman, D. J. 1990. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403– 410.
  3. Altschul, S. F. Madden, T. L. Schaffer, A. A. Zhang, J. Zhang, Miller, Z. W. and D. J. Lipman, D. J. 1997. Gapped BLAST and PSI–BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17):3389–3402.
  4. WU-BLAST [http://blast. wustl. edu/]
  5. Boysen, C. and Marc, A. Rieffel. 2004. Enhancing BLAST Performance by using paracel filtering package, Paracel Technology.
  6. Cameron, M. . Williams, H. E. and Cannane, A. 2004. Improved gapped alignment in BLAST. IEEE Transactions on Computational Biology and Bioinformatics, 1(3):116-129.
  7. Cameron, M. . Williams, H. E. and Cannane, A. 2006. A deterministic finite automaton for faster protein hit detection in BLAST, Journal of Computational Biology 13(4), 965–978.
  8. McGinnis, S. and T. L. Madden, T. L. 2004. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Research, 32:W20–W25.
  9. Chen, X. L. 2004. personal communication.
  10. Shapaer, E. G. Robinson, M. Yee, D. Candlin, J. D. Mines, R. and Hunkapiller, T. 1996. Sensitivity and selectivity in protein similarity searches: A Comparison of Smith-Waterman in Hardware to BLAST and FASTA. Genomics,38, 179-191.
  11. Gotoh, O. 1982. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162(3):705–708.
  12. Jian, Y. McGinnis, S. and Madden, T. L. 2006. BLAST: improvements for better sequence analysis, W6–W9 Nucleic Acids Research, Vol. 34, Web Server issue doi:10. 1093/nar/gkl164
  13. Noé,* L. and Kucherov, G. 2004. Improved hit criteria for DNA local alignment, BMC Bioinformatics, 5:149 doi:10. 1186/1471-2105-5-149.
  14. Delaney, S. Butler, G. Lam, C. and Thiel, L. Three Improvements to the BLASTP Search of Genome Databases, 0-7695-0686-0/00 $10. 00 _ 2000 IEEE
  15. Cameron, M. and Williams, H. E. 2007. Comparing Compressed Sequences for Faster Nucleotide BLAST Searches, IEEE Transactions.
  16. Afratis, P. Galanakis, C. Sotiriades, E. Mplemenos, G. G. Chrysos, G. Papaefstathiou, I. and Pnevmatikatos, D. Design and Implementation of a Database Filter for BLAST Acceleration, 978-3-9810801-5-5/DATE09 © 2009 EDAA.
  17. Guo, X. Wang, H. Vijay, D. Design of a FPGA-Based Parallel Architecture for BLAST Algorithm with Multi-hits Detection, 2011 Eighth International Conference on Information Technology: New Generations, 978-0-7695-4367-3/11 $26. 00 © 2011 IEEE, DOI 10. 1109/ITNG. 2011. 122.
  18. Boratyn, G. M. Schaffer, A. A. Agarwala, R. Altschul, S. F. Lipman, D. J. and Madden, T. L. 2012. Domain enhanced lookup time accelerated BLAST, Biol Direct Apr 17;7(1):12.
  19. Anuradha, M. Suman Nelson, K. and Prasad Reddy, P. V. G. D. March 2012. Improved hit detection algorithm for FSA protein BLAST. International Journal of Bioscience, Biochemistry and Bioinformatics, Vol. 2, No. 2:61-65.
  20. Wilbur, W. J. and Lipman, D. J. 1983. Rapid similarity searches of nucleic acid and protein data banks. Proceedings of the National Academy of Sciences USA, 80(3):726–730.
  21. Karlin, S. and Altschul, S. F. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences USA, 87(6):2264–2268.
  22. Altschul, S. F. Boguski, M. Gish, W. and Wootton, J. 1994. Issues in searching molecular sequence databases. Nature Genetics, 6:119–129.
  23. Altschul, S. F. and Gish, W. Local alignment statistics. Methods in Enzymology, 266:460–480, 1996.
  24. Altschul, S. F. Bundschuh, R. Olsen, R. . and Hwa, T. 2001. The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Research, 29(2):351–361.
  25. Henikoff, S. and Henikoff, J. 1992. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences USA, 89(22):10915–10919.
Index Terms

Computer Science
Information Sciences

Keywords

Deterministic finite automaton prefix word table neighborhood words query pointers hits and BLOSUM Trie