CFP last date
20 May 2024
Reseach Article

Survey of Compression of DNA Sequence

by Dhajvir Singh Rai, R. K. Bharti, Bhawana Parihar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 73 - Number 6
Year of Publication: 2013
Authors: Dhajvir Singh Rai, R. K. Bharti, Bhawana Parihar
10.5120/12749-9672

Dhajvir Singh Rai, R. K. Bharti, Bhawana Parihar . Survey of Compression of DNA Sequence. International Journal of Computer Applications. 73, 6 ( July 2013), 52-58. DOI=10.5120/12749-9672

@article{ 10.5120/12749-9672,
author = { Dhajvir Singh Rai, R. K. Bharti, Bhawana Parihar },
title = { Survey of Compression of DNA Sequence },
journal = { International Journal of Computer Applications },
issue_date = { July 2013 },
volume = { 73 },
number = { 6 },
month = { July },
year = { 2013 },
issn = { 0975-8887 },
pages = { 52-58 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume73/number6/12749-9672/ },
doi = { 10.5120/12749-9672 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:39:24.269321+05:30
%A Dhajvir Singh Rai
%A R. K. Bharti
%A Bhawana Parihar
%T Survey of Compression of DNA Sequence
%J International Journal of Computer Applications
%@ 0975-8887
%V 73
%N 6
%P 52-58
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Compression of large collections of data can lead to improvements in retrieval times by offsetting the CPU decompression costs with the cost of seeking and retrieving data from disk. In this paper, the author has study the different compression method which can compress the large DNA sequence. In this paper, authors have explored the DNA compression method that is COMRAD, which is used to compare with the dictionary based compression method i. e. LZ77, LZ78, LZW and general purpose compression method RAY. In this, authors have analyzed which one algorithm is better to compress the large collection of the DNA Sequence. Compression table and the line graph show that which compression algorithm has a better compression ratio and the compression size. It also shows that which one has better compression and decompression time.

References
  1. D. Wheeler et al. , "The Complete Genome of an Individual by Massively Parallel DNA Sequencing," Nature, vol. 452, no. 7189, pp. 872-876, 2008.
  2. D. Bentley et al. , "Accurate Whole Human Genome Sequencing Using Reversible Terminator Chemistry," Nature, vol. 456, no. 7218, pp. 53-59, 2008.
  3. J. Wang et al. , "The Diploid Genome Sequence of an Asian Individual," Nature, vol. 456, no. 7218, pp. 60-65, 2008.
  4. S. Schuster et al. , "Complete Khoisan and Bantu Genomes from Southern Africa," Nature, vol. 463, no. 7283, pp. 943-947, 2010.
  5. L. Stein. The case for cloud computing in genome informatics. Genome Biology, 11(5):207, 2010.
  6. B. Behzadi and F. L. Fessant. DNA compression challenge revisited: A dynamic programming approach. In Proc. 16th Annual Symposium on Combinatorial Pattern Matching (CPM'05), pages 190{200, 2005.
  7. M. D. Cao, T. Dix, L. Allison, and C. Mears. A simple statistical algorithm for biological sequence compression. In Proc. Data Compression Conference (DCC'07), pages 43{52, 2007.
  8. X. Chen, S. Kwong, and M. Li. A compression algorithm for DNA sequences and its applications in genome comparison. In Proc. 4th Conference on Research in Computational Molecular Biology (RECOMB'00), pages 107-117, 2000.
  9. X. Chen, M. Li, B. Ma, and J. Tromp. DNACompress: fast and effective DNA sequence compression. Bioinformatics, 18(12):1696-1698, 2002.
  10. S. Grumbach and F. Tahi. Compression of DNA sequences. In Proc. Data Compression Conference (DCC'93), pages 340-350, 1993.
  11. S. Grumbach and F. Tahi. A new challenge for compression algorithms: Genetic sequences. Information Processing & Management, 30(6):875-886, 1994.
  12. E. Rivals, J. Delahaye, M. Dauchet, and O. Delgrange. A guaranteed compression scheme for repetitive DNA sequences. In Proc. Data Compression Conference (DCC'96), page 453, 1996.
  13. S. Kuruppu, B. Beresford-Smith, T. Conway, and J. Zobel. Iterative dictionary construction for compression of large dna datasets. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011. To appear.
  14. S. Kuruppu, S. J. Puglisi, and J. Zobel. Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In Proc. 17th Symposium on String Processing and Information Retrieval (SPIRE'10), pages 201-206, 2010.
  15. V. M akinen, G. Navarro, J. Sir en, and N. V alim aki. Storage and retrieval of highly repetitive sequence collections. Journal of Computational Biology,17(3):281-308, 2010.
  16. A. Cannane and H. Williams, "General-Purpose Compression for Efficient Retrieval," J. Am. Soc. for Information Science and Technology, vol. 52, no. 5, pp. 430-437, 2001.
  17. S. Kuruppu, B. Beresford-Smith, T. Conway, and J. Zobel. Iterative dictionary construction for compression of large dna datasets. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011. To appear.
  18. MarkNelson and Jean-loup Gailly" The Data Compression Book" http://staff. uob. edu. bh/files/781231507_files/The-Data-Compression-Book-2nd-edition. pdf
  19. Reference Sequence Construction for Relative Compression of Genomes Shanika Kuruppuy Simon J. Puglisiz Justin Zobely arXiv:1106. 3791v1 [q-bio. QM] 20 Jun 2011
  20. L. Felician and A. Gentili, A nearly optimal Huffman technique in the microcomputer environment, Inf. Sys. 12, 4 (1987), 371.
  21. Mark Nelson and Jean-loup Gailly, The Data Compression Book, http://read. pudn. com/downloads153/ebook/675728/The_Data_Compression_Book_By_Mark_Nelson. pdf
  22. Mamta Sharma," Compression Using Huffman Coding", IJCSNS International Journal of Computer Science and Network Security, VOL. 10 No. 5, May 2010
  23. Ziad M. Alasmer, Bilal M. Zahran, Belal A. Ayyoub, Monther A. Kanan," A Comparison between English and Arabic Text Compression", Contemporary Engineering Sciences, Vol. 6, 2013, no. 3, 111 – 119 HIKARI Ltd, www. m-hikari. com
Index Terms

Computer Science
Information Sciences

Keywords

LZ77 LZ78 LZW RAY COMRA DNA Sequence