CFP last date
22 April 2024
Call for Paper
May Edition
IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 22 April 2024

Submit your paper
Know more
Reseach Article

Byte Pair Transformation using Zero-Frequency Bytes with Varying Number of Passes

by Jyotika Doshi, Savita Gandhi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 95 - Number 20
Year of Publication: 2014
Authors: Jyotika Doshi, Savita Gandhi
10.5120/16711-6869

Jyotika Doshi, Savita Gandhi . Byte Pair Transformation using Zero-Frequency Bytes with Varying Number of Passes. International Journal of Computer Applications. 95, 20 ( June 2014), 25-30. DOI=10.5120/16711-6869

@article{ 10.5120/16711-6869,
author = { Jyotika Doshi, Savita Gandhi },
title = { Byte Pair Transformation using Zero-Frequency Bytes with Varying Number of Passes },
journal = { International Journal of Computer Applications },
issue_date = { June 2014 },
volume = { 95 },
number = { 20 },
month = { June },
year = { 2014 },
issn = { 0975-8887 },
pages = { 25-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume95/number20/16711-6869/ },
doi = { 10.5120/16711-6869 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:19:57.925380+05:30
%A Jyotika Doshi
%A Savita Gandhi
%T Byte Pair Transformation using Zero-Frequency Bytes with Varying Number of Passes
%J International Journal of Computer Applications
%@ 0975-8887
%V 95
%N 20
%P 25-30
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Byte pair encoding (BPE) algorithm was suggested by P. Gage is to achieve data compression. It encodes all instances of most frequent byte-pair using zero-frequency byte in the source data. This process is repeated for maximum m possible number of passes until no further compression is possible, either because there are no more frequently occurring byte pairs or there are no more unused zero-frequency bytes to represent pairs. It writes out substitution information before the encoded data in each pass. This algorithm is very time consuming as it requires to determine most frequent byte-pair in each pass before starting substitution. We have proposed k-pass byte-pair transformation algorithm where k may be very very small as compared to maximum possible passes m. Our aim is to minimize the compression time and achieve equvivalent compression rate. Proposed algorithm transforms half of the possible most-frequent byte pairs in each pass except the last. In the last pass, it transforms all remaining possible byte-pairs. This reduced number of passes save the time taken in computing frequency of byte-pairs in maximum m passes. Experimental results have shown that proposed algorithm had taken 3. 213, 9. 794, 13. 324, 16. 323, 22. 388 seconds with 1, 2, 3, 4 and 6 passes respectively as compared to 295. 642 seconds of m-passes. Compression rate achieved due to transformation is 14. 72%, 20. 12%, 21. 89%, 22. 67% and 22. 96% with 1, 2, 3, 4 and 6 passes respectively as compared to 25. 55% using maximum m-passes. As the number of passes increases, compression is better with increased execution time. Our aim of achieving speed is achieved with little loss in compression rate.

References
  1. Philip Gage, "A New Algorithm For Data Compression", The C Users Journal, vol. 12(2)2, pp. 23–38, February 1994
  2. Altan Mesut, Aydin Carus, "ISSDC: Digram Coding Based Lossless Data Compression Algorithm", Computing and Informatics, Vol. 29, pp. 741–754, 2010
  3. Sayood Khalid, "Introduction to Data Compression",2nd edition, Morgan Kaufmann, 2000
  4. Ian H. Witten, Alistair Moffat, Timothy C. Bell, "Managing Gigabytes-Compressing and Indexing Documents and Images", 2nd edition, Morgan Kaufmann Publishers, 1999
  5. mattmahoney. net/dc/bpe2v2. cpp
  6. Jyotika Doshi, Savita Gandhi, "Quad-Byte Transformation as a Pre-processing to Arithmetic Coding", International Journal of Engineering Research & Technology (IJERT), Vol. 2 Issue 12, December 2013, e-ISSN: 2278-0181
  7. Jyotika Doshi, Savita Gandhi, "Article: Achieving Better Compression Applying Index-based Byte-Pair Transformation before Arithmetic Coding", International Journal of Computer Applications 90(13):42-47, March 2014.
  8. Y. Shibata, T. Kida, S. Fukamachi, T. Takeda, A. Shinohara, S. Shinohara and S. Arikawa, "Byte-pair encoding: An text compression scheme that accelerates pattern matching", Technical Report, Department of Informatics, Kyushu University, Japan, 1999.
  9. Aydin Carus, Altan Mesut, "Comparison of the Performance of Compression Methods Depending on Natural Languages", International Scientific Conference 23-24 Nov 2007, GABROVO
Index Terms

Computer Science
Information Sciences

Keywords

Data Compression better compression rate byte-pair data transformation substitution using zero-frequency byte symbols