CFP last date
20 May 2024
Reseach Article

Machine Learning Classification Algorithms to Recognize Chart Types in Portable Document Format (PDF) Files

by V. Karthikeyani, S. Nagarajan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 39 - Number 2
Year of Publication: 2012
Authors: V. Karthikeyani, S. Nagarajan
10.5120/4789-6997

V. Karthikeyani, S. Nagarajan . Machine Learning Classification Algorithms to Recognize Chart Types in Portable Document Format (PDF) Files. International Journal of Computer Applications. 39, 2 ( February 2012), 1-5. DOI=10.5120/4789-6997

@article{ 10.5120/4789-6997,
author = { V. Karthikeyani, S. Nagarajan },
title = { Machine Learning Classification Algorithms to Recognize Chart Types in Portable Document Format (PDF) Files },
journal = { International Journal of Computer Applications },
issue_date = { February 2012 },
volume = { 39 },
number = { 2 },
month = { February },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume39/number2/4789-6997/ },
doi = { 10.5120/4789-6997 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:25:21.414937+05:30
%A V. Karthikeyani
%A S. Nagarajan
%T Machine Learning Classification Algorithms to Recognize Chart Types in Portable Document Format (PDF) Files
%J International Journal of Computer Applications
%@ 0975-8887
%V 39
%N 2
%P 1-5
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Chart recognition system from PDF files is a relatively young research field where techniques and algorithms are proposed to identify type of charts and interpret them. This paper focus on recognition of chart type that is a part of PDF document using texture features and classification algorithm. Eleven types of texture features and three classifiers, namely, Multilayer perceptron, support vector machine and K nearest neighbour, are used. Performance analysis of the proposed chart type recognition systems show that texture features for chart type recognition has promising future and produces best result while using KNN and SVM algorithm.

References
  1. Caylak, E. (2010) The studies about phonological deficit theory in children with developmental dyslexia, Review. Am. J. Neurosci., Vol. 1, Pp. 1-12.
  2. Chowdhury, S.P., Mandal, S., Das, A.K. and Chanda, B. (2007) Segmentation of Text and Graphics from Document Images, Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, Pp. 619-623.
  3. Conker, R.S. (1988) Dual Plane Variation of the Hough Transform for Detecting Non-Concentric Circles of Different Radii, CVGIP, Vol. 43, Pp 115-132.
  4. Cortes, C. and Vapnik, V. (1995) Support Vector Networks, Machine Learning, Vol. 20, Pp. 273-297.
  5. Dori, D. (1995) Vector-Based Arc Segmentation in the Machine Drawing Understanding System Environment, IEEE Transactions on PAMI, Vol. 17, No. 11, Pp 1057-1068, 1995.
  6. Futrelle, R.P., Kakadiaris, I.A., Alexander, J., Carriero, C.M., Nikolakis, N. and Futrelle, J.M. (1992) Understanding diagrams in technical documents, IEEE Computer, Vol. 25, Issue 7, Pp. 75-78.
  7. Futrelle, R.P., Shao, M., Cieslik, C. and Grimes, A.E. (2003) Extraction, layout analysis and classification of diagrams in PDF documents, Intl. Conf. Document Analysis & Recognition. Edinburgh, Scotland, Pp. 1007-1014.
  8. Haralic, R.M., Shanmugam, K. and Dinstein, I. (1973) Textural features for image classification, IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-3, No. 6, Pp. 610-621.
  9. Inokuchi, A., Washio, T. and Motoda, H. (2000) An Apriori-based algorithm for mining frequent substructures from graph data, Proceedings. of the 4th PKDD, Pp.13–23.
  10. Islam, R., Saha, R.S. and Hossain, A.R. (2009) Automatic Reading from Bangla PDF Document Using Rule Based Concatenative Synthesis, International Conference on Signal Processing Systems, IEEE Computer Society, Pp. 521-525.
  11. Karthikeyani, V. and Nagarajan, S. (2011) Scientific Chart Image Property Identification using Connected Component Labeling in PDF document, 3rd International Conference on Electronics Computer Technology, Kanyakumari, India, Vol.4, Pp.209-212.
  12. Kramer, S. and Raedt, L.D. (2001) Feature construction with version spaces for biochemical application. Proceedings of the 18th ICML Conference,
  13. Martinez-Alvarez, R.P., Costas-Rodriguez, S., Gonzalez-Castao, F.J. and Gil-Castieira, F. (2010) Automated Document Conversion System for Simple Multimedia Platforms, 7th IEEE Consumer Communications and Networking Conference (CCNC), Pp. 1-2.
  14. Omaima, N.A. (2010) Improving the performance of backpropagation neural network algorithm for image compression/decompression system, J. Comput. Sci., Vol. 6, Pp. 1347-1354.
  15. Rosin, P.L. and West, G. A. (1989) Segmentation of Edges into Lines and Arcs, Image and Vision Computing, Vol. 7, No.2, Pp 109-114.
  16. Shao, M. and Futrelle, R.P. (2006) Recognition and Classification of Figures in PDF Documents, W. Liu and J. Lladós (Eds.): Selected papers from Workshop on Graphics Recognition, GREC 2005, LNCS 3926, Springer, Pp. 231-242.
  17. Smach, F., Atri., M., Miteran , J. and Abid, M. (2005) Design of a Neural Networks Classifier for Face Detection, World Academy of Science, Engineering and Technology, Vol. 11, Pp. 123-127.
  18. Song, J., Su, F., Chen, J., Tai, C. L. and Cai, S. (2000) Line net global vectorization: an algorithm and its performance analysis, IEEE Conference on Computer Vision and Pattern Recognition, South Carolina, Pp. 383-388.
  19. Yokokura, N. and Watanabe, T. (1997) Layout-Based Approach for extracting constructive elements of bar-charts, GREC'97, Pp. 163-174. 1997
  20. Zhou, Y. and Tan, C.L. (2001a) Hough-based Model for Recognizing Bar Charts in Document Images, SPIE conference on Document image and retrieval, Vol. 4307, Pp. 333-340.
  21. Zhou, Y. and Tan, C.L. (2001b) Learning-based scientific chart recognition, 4th International Workshop on Graphics Recognition, GREC2001, Pp. 482-492.
Index Terms

Computer Science
Information Sciences

Keywords

Chart Classification Texture Feature Neural Network. Support Vector Machine K Nearest Neighbour Classifier.