Call for Paper - June Edition
IJCA solicits original research papers for the June Edition of IJCA. Last date of manuscript submission is May 21, 2012. Read More

An Efficient Horizontal and Vertical Method for Online DNA Sequence Compression

Print
PDF
International Journal of Computer Applications
© 2010 by IJCA Journal
Number 1 - Article 8
Year of Publication: 2010
Authors:
Kamta Nath Mishra
Dr. Anupam Aaggarwal
Dr. Edries Abdelhadi
Dr. Prakash C. Srivastava
10.5120/757-954

Kamta Nath Mishra, Dr. Anupam Aaggarwal, Dr. Edries Abdelhadi and Dr. Prakash C Srivastava. Article: An Efficient Horizontal and Vertical Method for Online DNA Sequence Compression. International Journal of Computer Applications 3(1):39–46, June 2010. Published By Foundation of Computer Science. BibTeX

@article{key:article,
	author = {Kamta Nath Mishra and Dr. Anupam Aaggarwal and Dr. Edries Abdelhadi and Dr. Prakash C. Srivastava},
	title = {Article: An Efficient Horizontal and Vertical Method for Online DNA Sequence Compression},
	journal = {International Journal of Computer Applications},
	year = {2010},
	volume = {3},
	number = {1},
	pages = {39--46},
	month = {June},
	note = {Published By Foundation of Computer Science}
}

Abstract

DNA matching has become one of the most used biometric identification method during the last several years. DNA stores the information for creating and organizing an organism. It can be thought of as a string over the alphabets {A, C, G, T, N}, which makes four chemical components that make it up. Here, N represents an unknown nucleotide. This unknown nucleotide may be either A, or C, or G, or T. The size of each sequence is varying in the range of millions to billions of nucleotides.

Compression of DNA is interesting for both practical reasons (such as reduced storage and transmission cost) and functional reasons (such as inferring structure and function from compression models). We present a new Lossless Compression algorithm; which compresses data first horizontally and then vertically. It is based on substitution and statistical methods. We claim that our algorithm achieves one of the best compression ratios for bench mark DNA sequences in comparison to other DNA sequence compression methods.

Reference

  • [AL2000] Alberto Apostolico and Stefano Lonardo, Compression of Biological Sequences By Greedy Off-Line Textual Substitution, 2000.
  • [AB92] A. Apostolico, D. Bresauer, and Z. Galil. Optimal Parallel algorithms for periods, palindromes, and squares. In unpublished 1992.
  • [LZ76] A. Lempel and J. Ziv, On the complexity of finite sequences, IEE Transaction Inform. Theory, 22(1): 75-81, 1976.
  • [CKL99] Chen, X, Kwong, S., and Li, M., A compression algorithm for DNA sequences and its application in genome comparison, Genome Informatics, 10:52-61, 1999
  • [Huf52] D.A. Huffman, A method for construction of minimum redundancy codes, In proc. IRE, volume 40, page 1098 – 11101, Sept 1952.
  • [LH87] D.A. Lelewer and D.S. Hirshberg, Data Compression. ACM computing Surveys, 19(3):261-287, 1987.
  • [EAHKNM2008] E.A. Hadi, K. N. Mishra, DNA Sequence Compression Algorithm, National Conference on Communication and Information Technology, Tripoli, May 19 – 21 2008.
  • [TG93] Fariza Tahi, Stephen Grumbach, A New Challenge for Compression Algorithms: Genetic Sequence, 1993
  • [TG94] Fariza Tahi, Stephen Grumbach, Compression of DNA Sequences: Extended Abstract, 1994
  • [KT2005] Gregely Korodi, Ioan Tabus, An Efficient Normal Maximum Likelyhood Algorithm for DNA sequence compression, 2005.
  • [Sto88] J.A. Storer. Data Compression methods and theory, Computer Science Press, 1988.
  • [ZL77] J. Ziv and A. Lempel, A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 22(3): 337-343, May 1977.
  • [ZL78] J. Ziv and A. Lempel, Compression of individual sequences via variable rate coding. IEEE Transactions on Information Theory, 24(5): 530-536, Sept 1978.
  • [SSZR2005] Sheng Bao, Shi Chen, Zhi-Qiang Jing, Ran Ren, DNA Sequence Compression Algorithm Based on LUT and LZ77, Proceeding of the fifth International Symposium on Signal Processing and Information Technology, Volume, Issue, 18 – 21 Dec. 2005 Page(s): 23 – 28.
  • [GT94] S. Grumbach, and H. Tahi, A New Challenge for compression Algorithms: genetic sequences, Information Processing and Management, 30:875-886, 1994.
  • [VVP] V.V. Pillay, Textbook of Forensic Medicine and Toxicology, Paras Publication, India
  • www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html, Jan 2008
Learn about the IJCA article correction policy and process
Dealing with any form of copyright/ intellectual infringement.
Excerpts from the book ‘Peer Review – A Critical Inquiry’ by David Shatz
Take advantage of the special issue on Network Security
Directly place requests for print/ hard copies of IJCA via Google Docs