Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2017
Authors:
Amarnath R., P. Nagabhushan
10.5120/ijca2017915133

Amarnath R. and P Nagabhushan. Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation. International Journal of Computer Applications 172(4):40-47, August 2017. BibTeX

@article{10.5120/ijca2017915133,
	author = {Amarnath R. and P. Nagabhushan},
	title = {Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation},
	journal = {International Journal of Computer Applications},
	issue_date = {August 2017},
	volume = {172},
	number = {4},
	month = {Aug},
	year = {2017},
	issn = {0975-8887},
	pages = {40-47},
	numpages = {8},
	url = {http://www.ijcaonline.org/archives/volume172/number4/28242-2017915133},
	doi = {10.5120/ijca2017915133},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Line separators are used to segregate text-lines from one another in document image analysis. Finding the separator points at every line terminal in a document image would enable text-line segmentation. In particular, identifying the separators in handwritten text could be a thrilling exercise. Obviously it would be challenging to perform this in the compressed version of a document image and that is the proposed objective in this research. Such an effort would prevent the computational burden of decompressing a document for text-line segmentation. Since document images are generally compressed using run length encoding (RLE) technique as per the CCITT standards, the first column in the RLE will be a white column. The value (depth) in the white column is very low when a particular line is a text line and the depth could be larger at the point of text line separation. A longer consecutive sequence of such larger depth should indicate the gap between the text lines, which provides the separator region. In case of over separation and under separation issues, corrective actions such as deletion and insertion are suggested respectively. An extensive experimentation is conducted on the compressed images of the benchmark datasets of ICDAR13 and Alireza et al [17] to demonstrate the efficacy.

References

  1. CCITT: 'Recommendation T.6 – Facsimile Coding Schemes and Coding Control Function from Group 4, International Telecommunication Union', (Extract from the Blue Book), Geneva, 1988.
  2. CCITT: 'Recommendation T.4, Standardization of group 3 facsimile apparatus for document transmission', terminal equipments and protocols for telematic services, vol. vii, fascicle, vii.3, geneva, tech. rep., 1985.
  3. Shulan Deng, Shahram Latifi, and Junichi Kanai: “Manipulation of Text Documents in the Modified Group 4 Domain', IEEE Second Workshop on Multimedia Signal Processing, 1998.
  4. C. Maa:'Identifying the existence of bar codes in compressed images', CVGIP: Graphical Models and Image Processing, pp. 56:352-356, 1994.
  5. Y. Shima, S. Kashioka, and J. Higashino: 'A High-speed Rotation Method for Binary Images Based on Coordinate Operation of Run Data', Systems and Computers in Japan, Vol. 20, No. 6, pp. 91-102, 1989.
  6. A.L. Spitz: 'Analysis of Compressed Document Images for Dominant Skew, Multiple Skew, and Logotype Detection', Computer Vision and Image Understanding, Vol. 70, No. 3, June, pp. 321–334, 1998.
  7. J. Kanai and A. D. Bangdanov: 'Projection profile based skew estimation algorithm for jbig compressed images', International Journal on Document Analysis and Recognition (IJDAR’98), vol. 1, pp. 43–51, 1998.
  8. Mohammed Javed, P. Nagabhushan, and B.B. Chaudhuri: 'Extraction of Line Word Character Segments Directly from Run Length Compressed Printed Text Documents'.
  9. E. Regentova, S. Latifi, D. Chen, K. Taghva, and D. Yao: 'Document analysis by processing jbig-encoded images', IJDAR, vol. 7, pp. 260-272, 2005.
  10. J. J. Hull: 'Document matching on ccitt group 4 compressed images', SPIE Conference on Document Recognition IV, pp. 8–14, Feb 1997.
  11. J. J. Hull: 'Document image similarity and equivalence detection', International Journal on Document Analysis and Recognition (IJDAR’98), vol. 1, pp. 37–42, 1998.
  12. Y. Lu and C. L. Tan: 'Document retrieval from compressed images', Pattern Recognition, vol. 36, pp. 987–996, 2003.
  13. Mohammed Javed, P. Nagabhushan, and B.B. Chaudhuri: 'Extraction of line-word-character segments directly from run-length compressed printed text-documents', 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG).
  14. Mohammed Javed, P. Nagabhushan, and B.B. Choudhuri: 'Direct Processing of Run-Length Compressed Document Image for Segmentation and Characterization of a Specified Block', International Journal of Computer Applications (0975 - 8887) Volume 83 - No.15, December 2013.
  15. Mohammed Javed, Krishnanand S.H, P. Nagabhushan, and B. B. Chaudhuri: 'Visualizing CCITT Group 3 and Group 4 TIFF Documents and Transforming to Run-Length Compressed Format Enabling Direct Processing in Compressed Domain', International Conference on Computational Modelling and Security (CMS 2016).
  16. Nikolaos Stamatopoulos, Basilis Gatos, Georgios Louloudis, Umapada Pal and Alireza Alaei: 'ICDAR2013 Handwritting Segmentation Contest', 2013 12th International Conference on Document Analysis and Recognition.
  17. Alireza Alaei, Umapada Pal and P. Nagabhushan: 'Dataset and Ground Truth for Handwritten Text in Four Different Scripts', Int. J. Patt. Recogn. Artif. Intell. 26, 1253001 (2012).
  18. Alireza Alaei, Umapada Pal and P. Nagabhushan:'A New Scheme for Unconstrained Handwritten Text-line Segmentation', Pattern Recognition 44 (2011), 917–928.
  19. D. Brodic: 'Methodology for the Evaluation of the Algorithms for Text Line Segmentation Based on Extended Binary Classification', Measurement Science Review, Volume 11, No. 3, 2011.
  20. B. B. Chaudhuri and Chandranath Adaka:'An Approach for Detecting and Cleaning of Struck-out Handwritten Text', Pattern Recognition, 2016.

Keywords

Line separators, Document image analysis, Handwritten text, Compression and decompression, RLE, CCITT.