CFP last date
20 May 2024
Reseach Article

Compression Algorithm for all Specified bases in Nucleic Acid Sequences

by Subhankar Roy, Sunirmal Khatua
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 75 - Number 4
Year of Publication: 2013
Authors: Subhankar Roy, Sunirmal Khatua
10.5120/13101-0399

Subhankar Roy, Sunirmal Khatua . Compression Algorithm for all Specified bases in Nucleic Acid Sequences. International Journal of Computer Applications. 75, 4 ( August 2013), 29-34. DOI=10.5120/13101-0399

@article{ 10.5120/13101-0399,
author = { Subhankar Roy, Sunirmal Khatua },
title = { Compression Algorithm for all Specified bases in Nucleic Acid Sequences },
journal = { International Journal of Computer Applications },
issue_date = { August 2013 },
volume = { 75 },
number = { 4 },
month = { August },
year = { 2013 },
issn = { 0975-8887 },
pages = { 29-34 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume75/number4/13101-0399/ },
doi = { 10.5120/13101-0399 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:43:24.324282+05:30
%A Subhankar Roy
%A Sunirmal Khatua
%T Compression Algorithm for all Specified bases in Nucleic Acid Sequences
%J International Journal of Computer Applications
%@ 0975-8887
%V 75
%N 4
%P 29-34
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Organizations such as IT industry, colleges and Scientists regularly encounter problems to handle large data sets for their different purpose in many areas as for example biological research. These limitations also affect internet search to fetch data, business for analysis etc. So it is simply needed generalized but special types of compression algorithm for dissimilar data to get utmost saving percentage. In this article Compression of biological data that is single and double strand DNA and single strand RNA have been considered. Since biological data are less random compare to any text data that means redundancy within the sequences are more but they have some special property as for example different types of repeat one of such repeat is called dinucleotide repeat . This type of repeat are more in any sequence. Here the two proposed algorithm are based on this repeat using static fixed length LUT for input file and output file mapping.

References
  1. Subhankar Roy, Sunirmal Khatua, Sudipta Roy and Prof. Samir K. Bandyopadhyay, "An Efficient Biological Sequence Compression Technique Using LUT and Repeat in the Sequence", IOSRJCE, Vol. 6, Issue 1, pp. 42-50, Sep-Oct. 2012.
  2. R. K. Bharti and Prof. R. K. Singh, "A Biological Sequence Compression based on Look up Table (LUT) using Complementary Palindrome of Fixed Size", ICJA (0975–8887), Volume 35– No. 11, December 2011.
  3. Heba Afify, Muhammad Islam and Manal Abdel Wahed, "DNA lossless differential compression algorithm based on similarity of genomic sequence database", IJCSIT, Vol. 3, No 4, August 2011.
  4. R. K. Bharti and Prof. R. K. Singh, "A Biological sequence compression Based on Approximate repeat Using Variable length LUT", International Journal of Advances in Science and Technology, Vol. 3, No. 3, PP: 71-75, 2011.
  5. Suman Chakraborty, Sudipta Roy, Prof. Samir K. Bandyopadhyay, "Image Steganography Using DNA Sequence and Sudoku Solution Matrix", International Journal of Advanced Research in Computer Science and Software Engineering(IJARCSSE), Volume 2, Issue 2, February 2012.
  6. Department of Chemistry, Queen Mary University of London, "Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences".
  7. Xin Chen, Sam Kwong and Ming LiA, "Compression Algorithm for DNA Sequences, Using Approximate Matching for Better Compression Ratio to Reveal the True Characteristics of DNA", pp. 61-66, IEEE Engineering in Medicine and Biology, July/August 2001.
  8. Gary Benson, "Tandem repeats finder: a program to analyze DNA sequences", pp. 573-580, Oxford University Press, Nucleic Acids Research, Vol. 27, No. 2.
  9. Ateet Meheta & Bankim Patel, "DNA compression using hash based data structure", International Journal of Information Technology and Knowledge Management, pp. 383-386, Vol. 2, No. 2, July-December 2010.
  10. Sequences are taken from: httpncbi. nl://www. m. nih. gov/Genbank.
Index Terms

Computer Science
Information Sciences

Keywords

Completely and incompletely specified nucleic acid bases static LUT dinucleotide repeats base pair sequence line length compressed sequence length compression factor saving percentage