Call for Paper - November 2023 Edition
IJCA solicits original research papers for the November 2023 Edition. Last date of manuscript submission is October 20, 2023. Read More

A Vital Approach to compress the Size of DNA Sequence using LZW (Lempel-Ziv-Welch) with Fixed Length Binary Code and Tree Structure

Print
PDF
International Journal of Computer Applications
© 2012 by IJCA Journal
Volume 43 - Number 1
Year of Publication: 2012
Authors:
Nishad Pm
R. Manicka Chezian
10.5120/6065-8193

Nishad Pm and Manicka R Chezian. Article: A Vital Approach to compress the Size of DNA Sequence using LZW (Lempel-Ziv-Welch) with Fixed Length Binary Code and Tree Structure. International Journal of Computer Applications 43(1):7-9, April 2012. Full text available. BibTeX

@article{key:article,
	author = {Nishad Pm and R. Manicka Chezian},
	title = {Article: A Vital Approach to compress the Size of DNA Sequence using LZW (Lempel-Ziv-Welch) with Fixed Length Binary Code and Tree Structure},
	journal = {International Journal of Computer Applications},
	year = {2012},
	volume = {43},
	number = {1},
	pages = {7-9},
	month = {April},
	note = {Full text available}
}

Abstract

The genome of an organism contains all hereditary information encoded in Deoxyribonucleic Acid (DNA). Molecular sequence databases (e. g. ,EMBL, Genbank, DDJB, Entrez, SwissProt, etc) represent millions of DNA sequences filling many thousands of gigabytes and the databases are doubled in size every 6-8 months, which may go to beyond the limit of storage capacity. There are several text compression algorithm used for DNA compression. This paper proposes a new hybrid algorithm is used to compress DNA sequence, the algorithm is designed by combining the fixed length binary code with the LZW (Lempel-Ziv-Welch) compression algorithm. Initially the input sequence is divided in to fragments where each fragment consist of four nucleotides and fixed length binary code is assigned to each nucleotide then the pattern (STR and CHR) in LZW used the same for creating the dictionary. Assigning a new binary code for each pattern in the dictionary using a binary tree, and the sequence is replaced binary code for the longest match in the dictionary while compression. The proposed approach attains maximum compression in DNA sequences.

References

  • Raja Rajeswari and Dr. Allam Apparao "Genbit compress – algorithm for repetitive and non-repetitive DNA sequences" Journal of Theoretical and Applied Information Technology 2005
  • Ateet Mehta and Bankim Patel "DNA compression Using Hash Based Data Structure" International Journal of Information Technology and Knowledge Management July-December 2010, Volume 2, No. 2, pp. 383-386
  • S. Grumbach and F. Tahi, "A new challenge for compression algorithms: Genetic sequences," J. Inform. Process. Manage. , vol. 30, no. 6, pp. 875-866, 1994.
  • S. Grumbach and F. Tahi, "Compression ofDNA sequences," in Proc. IEEE Symp. Data Compression, Snowbird, UT, 1993, pp. 340-350.
  • I. H. Witten, R. Neal, and J. G. Cleary, "Arithmetic coding for data compression," Commun. ACM, vol. 30, pp. 52-541, Jun. 1987.
  • É. Rivals, J. P. Delahaye, M. Dauchet, and O. Delgrange, "A Guaranteed Compression Scheme for Repetitive DNA Sequences," LIFL Lille I Univ. , Tech. Rep. IT-285, 1995.
  • G. Korodi and I. Tabus, "An efficient normalized maximum likelihood algorithm for DNA sequence compression," ACM Trans. on Information Systems, vol. 23, no. 1, pp. 3–34, Jan. 2005.
  • Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, Reading, Massachusetts: Addison-Welsley Publishing Company, 1992.
  • WELCH, T. A. 1984. A technique for high-performance data compression. IEEE Comput. 17, 6, 8–19. 9
  • ZIV, J. AND LEMPEL, A. 1978. "Compression of individual sequences via variable-rate coding". IEEE Trans. Inform. Theory 24, 5, 530–536.
  • ZIV, J. AND LEMPEL, A. 1977. "A universal algorithm for sequential data compression". IEEE Trans. Inform. Theory 23, 3, 337–343.
  • D. A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes", Proceedings of the I. R. E. , September 1952, pp 1098–1102.