Text Extraction from PDF document

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 22 June 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

Community Detection in Complex Network via BGLL Algorithm

June

2012

A Semi-Blind Reference Video Watermarking using Hybrid Transforms for Copyright Protection

August

2012

Algorithm for Linear Number Partitioning into Maximum Number of Subsets

September

2012

Driving Global Trade Success with ERP-Enabled Inter-Company Operations

Dec

2024

Reseach Article

Text Extraction from PDF document

Published on January 2013 by D. Sasirekha, E. Chandra

Amrita International Conference of Women in Computing - 2013

Foundation of Computer Science USA

AICWIC - Number 3

January 2013

Authors: D. Sasirekha, E. Chandra

D. Sasirekha, E. Chandra . Text Extraction from PDF document. Amrita International Conference of Women in Computing - 2013. AICWIC, 3 (January 2013), 17-19.

@article{

author = { D. Sasirekha, E. Chandra },

title = { Text Extraction from PDF document },

journal = { Amrita International Conference of Women in Computing - 2013 },

issue_date = { January 2013 },

volume = { AICWIC },

number = { 3 },

month = { January },

year = { 2013 },

issn = 0975-8887,

pages = { 17-19 },

numpages = 3,

url = { /proceedings/aicwic/number3/9876-1318/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 Amrita International Conference of Women in Computing - 2013

%A D. Sasirekha

%A E. Chandra

%T Text Extraction from PDF document

%J Amrita International Conference of Women in Computing - 2013

%@ 0975-8887

%V AICWIC

%N 3

%P 17-19

%D 2013

%I International Journal of Computer Applications

Abstract

Documents in PDF format are nowadays called the Universal document format. PDF to speech converter systems involves many steps to achieve. Text extraction is the primary step From PDF to do further processing. In this paper we start with the brief discussion about the steps involved in extracting the text from PDF documents. The aim of this paper is to give the introduction with some basic concepts on PDF, and with text extraction concepts, which will be useful for the readers who are less familiar in this area of research.

References

http://desktoppub. about. com/od/electronicpublishing/g/pdf. htm
http://www. digitalpreservation. gov/formats/fdd/fdd000030. shtml
http://www. techterms. com/definition/pdf
http://www. webopedia. com/TERM/P/PDF. html
Lin, X. , Gao, L. , Tang, Z. , Lin, X. , & Hu, X. 2011. Mathematical formula identification in PDF documents. In Document Analysis and Recognition (ICDAR), 2011 International Conference on (pp. 1419-1423)
AJEDIG, M. A. , Li, F. , & ur Rehman, A. 2011. A PDF Text Extractor Based on PDF-Renderer. In Proceedings of the International MultiConference of Engineers and Computer Scientists (Vol. 1)
Gupta, G. , Niranjan, S. , Shrivastava, A. , & Sinha, R. 2006. Document Layout Analysis and Classification and Its Application in OCR. In Enterprise Distributed Object Computing Conference Workshops, 2006. EDOCW'06. 10th IEEE International (pp. 58-58)
Williams S. Lovegrove and David F. Brailsford 1995 Document analysis of PDF files: methods, results and implications", Electronic publishing ,vol. 8 (2&3),20-220.
S. Audithan, R M. Chandrasekaran 2009 Document text extraction from document images using Haar Discrete Wavelet Transform" , EJSR.
Claudie Faure, Nicole Vincent 2009 Simultaneous detection of vertical and horizontal text lines based on perceptual organization Proc. SPIE 7247, Document Recognition and Retrieval XVI, 72470M doi:10. 1117/12. 805504,2009
K. S. Sesh Kumar, Anoop M. Namboodiri, and C. V. Jawahar 2006 Learning segmentation of documents with complex scripts ICVGIP'06 Proceedings of the 5th Indian Conference on Computer Vision, Graphics and Image Processing, pp. 749-760.
Song Mao, Azriel Rosenfeld, and Tapas Kanungo 2003 Document structure analysis algorithms: A literature survey Vol. 5010 of SPIE Proceedings, SPIE, pp. 197-207.
Tamir Hassan" Object-Level Document Analysis of PDF Files", DocEng'09, September 16-18, 2009, Munich, Germany.

Index Terms

Computer Science

Information Sciences

Keywords

Text Extraction Pdf Text Extraction Technique