CFP last date
22 April 2024
Reseach Article

A Compression and Encryption Algorithms on DNA Sequences using R^2CP and modified Huffman Technique

by Md. Syed Mahamud Hossein
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 57 - Number 1
Year of Publication: 2012
Authors: Md. Syed Mahamud Hossein
10.5120/9075-6994

Md. Syed Mahamud Hossein . A Compression and Encryption Algorithms on DNA Sequences using R^2CP and modified Huffman Technique. International Journal of Computer Applications. 57, 1 ( November 2012), 1-10. DOI=10.5120/9075-6994

@article{ 10.5120/9075-6994,
author = { Md. Syed Mahamud Hossein },
title = { A Compression and Encryption Algorithms on DNA Sequences using R^2CP and modified Huffman Technique },
journal = { International Journal of Computer Applications },
issue_date = { November 2012 },
volume = { 57 },
number = { 1 },
month = { November },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-10 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume57/number1/9075-6994/ },
doi = { 10.5120/9075-6994 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:59:18.744987+05:30
%A Md. Syed Mahamud Hossein
%T A Compression and Encryption Algorithms on DNA Sequences using R^2CP and modified Huffman Technique
%J International Journal of Computer Applications
%@ 0975-8887
%V 57
%N 1
%P 1-10
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A lossless compression algorithm, for genetic sequences, based on two phase, 1st phase- searching for exact R2CP is reported. The compression results obtained in the algorithm show that the exact R2CP are one of the main hidden regularities in DNA sequences. The proposed DNA sequence compression algorithm is based on R2CP substring and creates online Library file acting as a Look Up Table. The R2CP substring is replaced by corresponding ASCII character. Information security is the most challenging question to protect the data from piracy. This proposed method may protect the data from hackers. For better security purpose we have introduced a new security technique in 2nd phase that is selection encryption method. In this technique the data are encrypted either in the Look Up table or in compressed file or in both. It can also provide the data security, by using ASCII code and online library file acting as a signature. The size of library file is too small with respect to compressed file. Compressing the genome sequence will help to increase the effect of their uses. Speed of encryption and security levels are two important measurements for evaluating any encryption system. Selective encryption, where a part of message is encrypted keeping the remaining part unencrypted, can be a viable proposition for running encryption system in resource constraint. This algorithm is tested on benchmark DNA sequences. The running time of this algorithm is very few second and the complexity is O(n2). The algorithm can approach a compression rate of 3. 447387 bit/base using 1st phase compression technique, again the output of the 1st phase compression are used in 2nd phase compression techniques, at the end ultimate the resultant compression rate of 2. 01 bit/bases. When a user searches for any sequence for an organism, a encrypted compressed sequence file can be sent from the data source to the user. The encrypted compressed file then can be decrypted & decompressed at the client end resulting in reduced transmission time over the Internet. A encrypted compression algorithm that provides a moderately high compression with encryption rate with minimal decryption with decompression time.

References
  1. M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications, 2nd ed. New York: Springer-Verlag, 1997.
  2. Curnow, R. and Kirkwood, T. , Statistical analysis of deoxyribonucleic acid sequence data { a review, J. Royal Statistical Society, 152:199{220, 1989.
  3. Grumbach, S. and Tahi, F. , A new challenge for compression algorithms: genetic sequences, J. Information Processing and Management, 30(6):875-866, 1994.
  4. Lanctot, K. , Li, M. , and Yang, E. H. , Estimating DNA sequence entropy, to appear in SODA '2000.
  5. Rivals, _E. , Delahaye, J. -P. , Dauchet, M. , and Delgrange, O. , A Guaranteed Compression Scheme for Repetitive DNA Sequences, LIFL Lille I University, technical report IT-285, 1995.
  6. Bell, T. C. , Cleary, J. G. , and Witten, I. H. , Text Compression, Prentice Hall, 1990.
  7. Ma,B. , Tromp,J. and Li,M. (2002) PatternHunter—faster and more sensitive homology search. Bioinformatics, 18, 440–445. 1698
  8. H. Cheng and X. Li, "Partial Encryption of Compressed Images and Video," IEEE Transactions on Signal Processing, 48(8), 2000, pp. 2439-2451.
  9. C. E. Shannon, "Communication theory of secrecy systems," Bell Systems Technical Journal, v. 28, October 1949, pp. 656-715.
  10. D. A. Huffman, "A method for the construction of minimum-redundancy codes,"Proc. IRE, vol. 40, pp. 1098-1101,1952.
  11. Chen, L. , Lu, S. and Ram J. 2004. "Compressed Pattern Matching in DNA Sequences". Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference (CSB 2004)
  12. S. F. Altchul, W. Gish, W. Miller,E. W. Myers, and D. J. Lipman,1990, A. Basic Local Alignment search tool, J. Mol. Biol. 215 :403-410.
  13. S. Grumbach and F. Tahi, "A new challenge for compression algorithms: Genetic sequences," J. Inform. Process. Manage. , vol. 30, no. 6, pp. 875-866, 1994.
  14. E. Balagurusamy, Introduction to Computing. McGraw-Hill,1998
  15. K. R. Venugopal & S. R. Prasad, Mastering C. McGraw-Hill,1998
  16. Xin Chen, San Kwong and Mine Li, "A Compression Algorithm for DNA Sequences Using Approximate Matching for Better Compression Ratio to Reveal the True Characteristics of DNA", IEEE Engineering in Medicine and Biology,pp 61-66,July/August 2001.
  17. Chen, X. , Li, M. , Ma, B. and J. Tromp. 2002. "DNACompress: fast and effective DNA sequence compression". Bioinformatics. 18: 1696-1698.
  18. ASCII code. [Online]. Available: http://www. asciitable. com
  19. National Center for Biotechnology Information, http://www. ncbi. nlm. nih. gov
Index Terms

Computer Science
Information Sciences

Keywords

Repeat Reverse Complement Palindrome Compression Security