|
10.5120/757-954 |
Kamta Nath Mishra, Dr. Anupam Aaggarwal, Dr. Edries Abdelhadi and Dr. Prakash C Srivastava. Article: An Efficient Horizontal and Vertical Method for Online DNA Sequence Compression. International Journal of Computer Applications 3(1):39–46, June 2010. Published By Foundation of Computer Science. BibTeX
@article{key:article,
author = {Kamta Nath Mishra and Dr. Anupam Aaggarwal and Dr. Edries Abdelhadi and Dr. Prakash C. Srivastava},
title = {Article: An Efficient Horizontal and Vertical Method for Online DNA Sequence Compression},
journal = {International Journal of Computer Applications},
year = {2010},
volume = {3},
number = {1},
pages = {39--46},
month = {June},
note = {Published By Foundation of Computer Science}
}
Abstract
DNA matching has become one of the most used biometric identification method during the last several years. DNA stores the information for creating and organizing an organism. It can be thought of as a string over the alphabets {A, C, G, T, N}, which makes four chemical components that make it up. Here, N represents an unknown nucleotide. This unknown nucleotide may be either A, or C, or G, or T. The size of each sequence is varying in the range of millions to billions of nucleotides.
Compression of DNA is interesting for both practical reasons (such as reduced storage and transmission cost) and functional reasons (such as inferring structure and function from compression models). We present a new Lossless Compression algorithm; which compresses data first horizontally and then vertically. It is based on substitution and statistical methods. We claim that our algorithm achieves one of the best compression ratios for bench mark DNA sequences in comparison to other DNA sequence compression methods.
Reference
-
[AL2000] Alberto Apostolico and Stefano Lonardo, Compression of Biological Sequences By Greedy Off-Line Textual Substitution, 2000.
[AB92] A. Apostolico, D. Bresauer, and Z. Galil. Optimal Parallel algorithms for periods, palindromes, and squares. In unpublished 1992.
[LZ76] A. Lempel and J. Ziv, On the complexity of finite sequences, IEE Transaction Inform. Theory, 22(1): 75-81, 1976.
[CKL99] Chen, X, Kwong, S., and Li, M., A compression algorithm for DNA sequences and its application in genome comparison, Genome Informatics, 10:52-61, 1999
[Huf52] D.A. Huffman, A method for construction of minimum redundancy codes, In proc. IRE, volume 40, page 1098 – 11101, Sept 1952.
[LH87] D.A. Lelewer and D.S. Hirshberg, Data Compression. ACM computing Surveys, 19(3):261-287, 1987.
[EAHKNM2008] E.A. Hadi, K. N. Mishra, DNA Sequence Compression Algorithm, National Conference on Communication and Information Technology, Tripoli, May 19 – 21 2008.
[TG93] Fariza Tahi, Stephen Grumbach, A New Challenge for Compression Algorithms: Genetic Sequence, 1993
[TG94] Fariza Tahi, Stephen Grumbach, Compression of DNA Sequences: Extended Abstract, 1994
[KT2005] Gregely Korodi, Ioan Tabus, An Efficient Normal Maximum Likelyhood Algorithm for DNA sequence compression, 2005.
[Sto88] J.A. Storer. Data Compression methods and theory, Computer Science Press, 1988.
[ZL77] J. Ziv and A. Lempel, A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 22(3): 337-343, May 1977.
[ZL78] J. Ziv and A. Lempel, Compression of individual sequences via variable rate coding. IEEE Transactions on Information Theory, 24(5): 530-536, Sept 1978.
[SSZR2005] Sheng Bao, Shi Chen, Zhi-Qiang Jing, Ran Ren, DNA Sequence Compression Algorithm Based on LUT and LZ77, Proceeding of the fifth International Symposium on Signal Processing and Information Technology, Volume, Issue, 18 – 21 Dec. 2005 Page(s): 23 – 28.
[GT94] S. Grumbach, and H. Tahi, A New Challenge for compression Algorithms: genetic sequences, Information Processing and Management, 30:875-886, 1994.
[VVP] V.V. Pillay, Textbook of Forensic Medicine and Toxicology, Paras Publication, India
www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html, Jan 2008
UNITED STATES




