Call for Paper - April 2023 Edition
IJCA solicits original research papers for the April 2023 Edition. Last date of manuscript submission is March 20, 2023. Read More

Achieving Better Compression Applying Index-based Byte-Pair Transformation before Arithmetic Coding

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 90 - Number 13
Year of Publication: 2014
Jyotika Doshi
Savita Gandhi

Jyotika Doshi and Savita Gandhi. Article: Achieving Better Compression Applying Index-based Byte-Pair Transformation before Arithmetic Coding. International Journal of Computer Applications 90(13):42-47, March 2014. Full text available. BibTeX

	author = {Jyotika Doshi and Savita Gandhi},
	title = {Article: Achieving Better Compression Applying Index-based Byte-Pair Transformation before Arithmetic Coding},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {90},
	number = {13},
	pages = {42-47},
	month = {March},
	note = {Full text available}


Arithmetic coding is used in many compression techniques during the entropy encoding stage. Further compression is not possible without changing the data model and increasing redundancy in the data set. To increase the redundancy, we have applied index based byte-pair transformation (BPT-I) as a pre-processing to arithmetic coding. BPT-I transforms most frequent byte-pairs (2-byte integers). Here, most frequent byte-pairs are sorted in the order of their frequency and groups consisting of 256 byte-pairs are formed. Each byte-pair in a group is then encoded using two tokens: group number and the location in a group. Group number is denoted using variable length prefix codeword; whereas location within a group is denoted using 8-bit index. BPT-I is designed to be applied on any type of source; not necessarily text. More the number of groups considered during transformation, better is the compression. Experimental results have shown around 4. 30% additional reduction in compressed file size when arithmetic coding is applied after byte-pair data transformation BPT-I.


  • F. D. Awan, N. Zhang, N. Motgi, R. T. Iqbal, A. Mukherjee. "LIPT: A reversible lossless text transform to improve compression performance", Proceedings of the IEEE Data Compression Conference (DCC'2001), pp. 481, March 27–29, 2001
  • T. C. Bell, A. Moffat, "A Note on the DMC Data Compression Scheme", Computer Journal, vol. 32(1), pp. 16-20, 1989
  • M. Burrows,D. J. Wheeler. "A block-sorting lossless data compression algorithm", Digital Systems Research Center, Research Report 124, Digital Equipment Corporation, Palo Alto, California, May 10, 1994
  • G. V. Cormack, R. N. Horspool, "Data Compressing Using Dynamic Markov Modeling", Computer Journal, vol. 30(6), pp. 541-550, 1987
  • Jyotika Doshi and Savita Gandhi, "Computing Number of Bits to be processed using Shift and Log in Arithmetic Coding", International Journal of Computer Applications 62(15):14-20, January 2013, Published by Foundation of Computer Science, New York, USA. BibTeX
  • Jyotika Doshi, Savita Gandhi, "Quad-Byte Transformation as a Pre-processing to Arithmetic Coding", International Journal of Engineering Research & Technology (IJERT), Vol. 2 Issue 12, December 2013, e-ISSN: 2278-0181
  • M. Dyer,D. Taubman, S. Nooshabadi, "Improved throughput arithmetic coder for JPEG2000", Proc. Int. Conf. Image Process. , Singapore, pp. 2817–2820, Oct. 2004
  • Philip Gage, "A New Algorithm For Data Compression", The C Users Journal, vol. 12(2)2, pp. 23–38, February 1994
  • P. G. Howard, J. S. Vitter, "Arithmetic coding for data compression", Proc. IEEE. , vol. 82: pp. 857-865, 1994
  • J. C. Kieffer, E. H. Yang, "Grammar-based codes: A new class of universal lossless source codes", IEEE Trans. Inform. Theory, vol. 46, pp. 737–754, 2000
  • H. Kruse, A. Mukherjee. "Preprocessing Text to Improve Compression Ratios",Proc. Data Compression Conference, pp. 556, 1998
  • G. Langdon, "An introduction to arithmetic coding", IBM Journal Research and Development, vol. 28, pp. 135-149, 1984
  • Detlev Marpe, Heiko Schwarz, Thomas Wiegand, "Context-Based Adaptive Binary Arithmetic Coding in the H. 264/AVC Video Compression Standard", IEEE Trans. On Circuits and Systems for Video Technology, vol. 13(7), pp. 620-636, July 2003
  • Altan Mesut, Aydin Carus, "ISSDC: Digram Coding Based Lossless Dtaa Compression Algorithm", Computing and Informatics, Vol. 29, pp. 741–754, 2010
  • Moffat, "Implementing the PPM Data Compression Scheme", IEEE Transactions on Communications, vol. 38, pp. 1917-1921, 1990
  • M. Nelson, "Data Compressin with the Burrows-Wheeler Transform", Dr. Dobb's Journal, pp. 46-50, Sept 1996 available at http://marknelson. us/1996/09/01/bwt/
  • Radescu R. , "Lossless Text Compression Using the LIPT Transform", Proceedings of the 7th International Conference Communications 2008 (COMM2008), ISBN 978-606-521-008-0. , pp. 59-62, Bucharest, Romania, 5-7 June 2008
  • Senthil S, Robert L, "Text Preprocessing using Enhanced Intelligent Dictionary Based Encoding (EIDBE)", Proceedings of Third International Conference on Electronics Computer Technology, pp. 451-455, Apr 2011
  • Senthil S, Robert L, "IIDBE: A Lossless Text Transform for Better Compression", International Journal of Wisdom Based Computing, vol. 1(2), August 2011
  • Shajeemohan B. S, Govindan V. K, "Compression scheme for faster and secure data transmission over networks", IEEE Proceedings of the International conference on Mobile business, 2005
  • Storer J. A. , Szymanski T. G. , "Data Compression via Textual Substitution", Journal of ACM Vol. 29(4), pp. 928-951, Oct 1982
  • W. Sun, A. Mukherjee, N. Zhang, "A Dictionary-based Multi-Corpora Text compression System", Proceedings of the 2003 IEEE Data Compression Conference, March 2003
  • S. Taubman and M. W. Marcellin, "JPEG2000: Image Compression Fundamentals", Standards and Practice. Norwell, MA: Kluwer Academic, 2002
  • T. Wiegand, G. Sullivan, G. Bjontegaard, A. Luthra, "Overview of the H. 264/AVC video coding standard", IEEE Trans. Circuits Syst. Video Technol. , vol. 13(7), pp. 560–576, Jul 2003
  • T. Welch, "A Technique for High-Performance Data Compression", IEEE Computer, vol. 17(6), pp. 8-19, June 1984
  • M. J. Willems, Y. M. Shtarkov, T. J. Tjalkens, "The context-tree weighting method: Basic properties", IEEE Trans. Inform. Theory, vol. 41, pp. 653–664, May 1995
  • H. Witten, R. M. Neal, J. G. Cleary, "Arithmetic coding for data compression", Commun. ACM, vol. 30(6), pp. 520–540, 1987
  • H. Witten, Alistair Moffat, Timothy C. Bell, "Managing Gigabytes-Compressing and Indexing Documents and Images", 2nd edition, Morgan Kaufmann Publishers, 1999