A Vital Approach to compress the Size of DNA Sequence using LZW (Lempel-Ziv-Welch) with Fixed Length Binary Code and Tree Structure

Nishad Pm; R. Manicka Chezian

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

A Divisible Transferable E-Cash in Wireless Distributed Environment

February

2015

Implementation of Smart Health Care system using Zig-Bee enabled RFID and FPGA Technology

Oct

2019

K-Most Demanding Products Discovery with Maximum Expected Customers

August

2015

Query based Image Retrieval using Kekre’s, DCT and Hybrid wavelet Transform over 1st and 2nd Moment

October

2011

Reseach Article

A Vital Approach to compress the Size of DNA Sequence using LZW (Lempel-Ziv-Welch) with Fixed Length Binary Code and Tree Structure

by Nishad Pm, R. Manicka Chezian

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 43 - Number 1

Year of Publication: 2012

Authors: Nishad Pm, R. Manicka Chezian

10.5120/6065-8193

Nishad Pm, R. Manicka Chezian . A Vital Approach to compress the Size of DNA Sequence using LZW (Lempel-Ziv-Welch) with Fixed Length Binary Code and Tree Structure. International Journal of Computer Applications. 43, 1 ( April 2012), 7-9. DOI=10.5120/6065-8193

@article{ 10.5120/6065-8193,

author = { Nishad Pm, R. Manicka Chezian },

title = { A Vital Approach to compress the Size of DNA Sequence using LZW (Lempel-Ziv-Welch) with Fixed Length Binary Code and Tree Structure },

journal = { International Journal of Computer Applications },

issue_date = { April 2012 },

volume = { 43 },

number = { 1 },

month = { April },

year = { 2012 },

issn = { 0975-8887 },

pages = { 7-9 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume43/number1/6065-8193/ },

doi = { 10.5120/6065-8193 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:32:46.818230+05:30

%A Nishad Pm

%A R. Manicka Chezian

%T A Vital Approach to compress the Size of DNA Sequence using LZW (Lempel-Ziv-Welch) with Fixed Length Binary Code and Tree Structure

%J International Journal of Computer Applications

%@ 0975-8887

%V 43

%N 1

%P 7-9

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The genome of an organism contains all hereditary information encoded in Deoxyribonucleic Acid (DNA). Molecular sequence databases (e. g. ,EMBL, Genbank, DDJB, Entrez, SwissProt, etc) represent millions of DNA sequences filling many thousands of gigabytes and the databases are doubled in size every 6-8 months, which may go to beyond the limit of storage capacity. There are several text compression algorithm used for DNA compression. This paper proposes a new hybrid algorithm is used to compress DNA sequence, the algorithm is designed by combining the fixed length binary code with the LZW (Lempel-Ziv-Welch) compression algorithm. Initially the input sequence is divided in to fragments where each fragment consist of four nucleotides and fixed length binary code is assigned to each nucleotide then the pattern (STR and CHR) in LZW used the same for creating the dictionary. Assigning a new binary code for each pattern in the dictionary using a binary tree, and the sequence is replaced binary code for the longest match in the dictionary while compression. The proposed approach attains maximum compression in DNA sequences.

References

Raja Rajeswari and Dr. Allam Apparao "Genbit compress – algorithm for repetitive and non-repetitive DNA sequences" Journal of Theoretical and Applied Information Technology 2005
Ateet Mehta and Bankim Patel "DNA compression Using Hash Based Data Structure" International Journal of Information Technology and Knowledge Management July-December 2010, Volume 2, No. 2, pp. 383-386
S. Grumbach and F. Tahi, "A new challenge for compression algorithms: Genetic sequences," J. Inform. Process. Manage. , vol. 30, no. 6, pp. 875-866, 1994.
S. Grumbach and F. Tahi, "Compression ofDNA sequences," in Proc. IEEE Symp. Data Compression, Snowbird, UT, 1993, pp. 340-350.
I. H. Witten, R. Neal, and J. G. Cleary, "Arithmetic coding for data compression," Commun. ACM, vol. 30, pp. 52-541, Jun. 1987.
É. Rivals, J. P. Delahaye, M. Dauchet, and O. Delgrange, "A Guaranteed Compression Scheme for Repetitive DNA Sequences," LIFL Lille I Univ. , Tech. Rep. IT-285, 1995.
G. Korodi and I. Tabus, "An efficient normalized maximum likelihood algorithm for DNA sequence compression," ACM Trans. on Information Systems, vol. 23, no. 1, pp. 3–34, Jan. 2005.
Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, Reading, Massachusetts: Addison-Welsley Publishing Company, 1992.
WELCH, T. A. 1984. A technique for high-performance data compression. IEEE Comput. 17, 6, 8–19. 9
ZIV, J. AND LEMPEL, A. 1978. "Compression of individual sequences via variable-rate coding". IEEE Trans. Inform. Theory 24, 5, 530–536.
ZIV, J. AND LEMPEL, A. 1977. "A universal algorithm for sequential data compression". IEEE Trans. Inform. Theory 23, 3, 337–343.
D. A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes", Proceedings of the I. R. E. , September 1952, pp 1098–1102.

Index Terms

Computer Science

Information Sciences

Keywords

Embl Ddjb Genome Deoxyribonucleic Acid (dna) Lzw Nucleotide