Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

A Compression and Encryption Algorithms on DNA Sequences using R^2CP and modified Huffman Technique

Print
PDF
International Journal of Computer Applications
© 2012 by IJCA Journal
Volume 57 - Number 1
Year of Publication: 2012
Authors:
Md. Syed Mahamud Hossein
10.5120/9075-6994

Md. Syed Mahamud Hossein. Article: A Compression and Encryption Algorithms on DNA Sequences using R^2CP and modified Huffman Technique. International Journal of Computer Applications 57(1):1-10, November 2012. Full text available. BibTeX

@article{key:article,
	author = {Md. Syed Mahamud Hossein},
	title = {Article: A Compression and Encryption Algorithms on DNA Sequences using R^2CP and modified Huffman Technique},
	journal = {International Journal of Computer Applications},
	year = {2012},
	volume = {57},
	number = {1},
	pages = {1-10},
	month = {November},
	note = {Full text available}
}

Abstract

A lossless compression algorithm, for genetic sequences, based on two phase, 1st phase- searching for exact R2CP is reported. The compression results obtained in the algorithm show that the exact R2CP are one of the main hidden regularities in DNA sequences. The proposed DNA sequence compression algorithm is based on R2CP substring and creates online Library file acting as a Look Up Table. The R2CP substring is replaced by corresponding ASCII character. Information security is the most challenging question to protect the data from piracy. This proposed method may protect the data from hackers. For better security purpose we have introduced a new security technique in 2nd phase that is selection encryption method. In this technique the data are encrypted either in the Look Up table or in compressed file or in both. It can also provide the data security, by using ASCII code and online library file acting as a signature. The size of library file is too small with respect to compressed file. Compressing the genome sequence will help to increase the effect of their uses. Speed of encryption and security levels are two important measurements for evaluating any encryption system. Selective encryption, where a part of message is encrypted keeping the remaining part unencrypted, can be a viable proposition for running encryption system in resource constraint. This algorithm is tested on benchmark DNA sequences. The running time of this algorithm is very few second and the complexity is O(n2). The algorithm can approach a compression rate of 3. 447387 bit/base using 1st phase compression technique, again the output of the 1st phase compression are used in 2nd phase compression techniques, at the end ultimate the resultant compression rate of 2. 01 bit/bases. When a user searches for any sequence for an organism, a encrypted compressed sequence file can be sent from the data source to the user. The encrypted compressed file then can be decrypted & decompressed at the client end resulting in reduced transmission time over the Internet. A encrypted compression algorithm that provides a moderately high compression with encryption rate with minimal decryption with decompression time.

References

  • M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications, 2nd ed. New York: Springer-Verlag, 1997.
  • Curnow, R. and Kirkwood, T. , Statistical analysis of deoxyribonucleic acid sequence data { a review, J. Royal Statistical Society, 152:199{220, 1989.
  • Grumbach, S. and Tahi, F. , A new challenge for compression algorithms: genetic sequences, J. Information Processing and Management, 30(6):875-866, 1994.
  • Lanctot, K. , Li, M. , and Yang, E. H. , Estimating DNA sequence entropy, to appear in SODA '2000.
  • Rivals, _E. , Delahaye, J. -P. , Dauchet, M. , and Delgrange, O. , A Guaranteed Compression Scheme for Repetitive DNA Sequences, LIFL Lille I University, technical report IT-285, 1995.
  • Bell, T. C. , Cleary, J. G. , and Witten, I. H. , Text Compression, Prentice Hall, 1990.
  • Ma,B. , Tromp,J. and Li,M. (2002) PatternHunter—faster and more sensitive homology search. Bioinformatics, 18, 440–445. 1698
  • H. Cheng and X. Li, "Partial Encryption of Compressed Images and Video," IEEE Transactions on Signal Processing, 48(8), 2000, pp. 2439-2451.
  • C. E. Shannon, "Communication theory of secrecy systems," Bell Systems Technical Journal, v. 28, October 1949, pp. 656-715.
  • D. A. Huffman, "A method for the construction of minimum-redundancy codes,"Proc. IRE, vol. 40, pp. 1098-1101,1952.
  • Chen, L. , Lu, S. and Ram J. 2004. "Compressed Pattern Matching in DNA Sequences". Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference (CSB 2004)
  • S. F. Altchul, W. Gish, W. Miller,E. W. Myers, and D. J. Lipman,1990, A. Basic Local Alignment search tool, J. Mol. Biol. 215 :403-410.
  • S. Grumbach and F. Tahi, "A new challenge for compression algorithms: Genetic sequences," J. Inform. Process. Manage. , vol. 30, no. 6, pp. 875-866, 1994.
  • E. Balagurusamy, Introduction to Computing. McGraw-Hill,1998
  • K. R. Venugopal & S. R. Prasad, Mastering C. McGraw-Hill,1998
  • Xin Chen, San Kwong and Mine Li, "A Compression Algorithm for DNA Sequences Using Approximate Matching for Better Compression Ratio to Reveal the True Characteristics of DNA", IEEE Engineering in Medicine and Biology,pp 61-66,July/August 2001.
  • Chen, X. , Li, M. , Ma, B. and J. Tromp. 2002. "DNACompress: fast and effective DNA sequence compression". Bioinformatics. 18: 1696-1698.
  • ASCII code. [Online]. Available: http://www. asciitable. com
  • National Center for Biotechnology Information, http://www. ncbi. nlm. nih. gov