CFP last date
22 April 2024
Reseach Article

Choosing Shape Features by means of Genetic Algorithms for Gylph-clustering of Historical Documents

by Jan-Hendrik Worch, Bjoern Gottfried
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 102 - Number 3
Year of Publication: 2014
Authors: Jan-Hendrik Worch, Bjoern Gottfried
10.5120/17792-8585

Jan-Hendrik Worch, Bjoern Gottfried . Choosing Shape Features by means of Genetic Algorithms for Gylph-clustering of Historical Documents. International Journal of Computer Applications. 102, 3 ( September 2014), 1-6. DOI=10.5120/17792-8585

@article{ 10.5120/17792-8585,
author = { Jan-Hendrik Worch, Bjoern Gottfried },
title = { Choosing Shape Features by means of Genetic Algorithms for Gylph-clustering of Historical Documents },
journal = { International Journal of Computer Applications },
issue_date = { September 2014 },
volume = { 102 },
number = { 3 },
month = { September },
year = { 2014 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume102/number3/17792-8585/ },
doi = { 10.5120/17792-8585 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:32:07.449006+05:30
%A Jan-Hendrik Worch
%A Bjoern Gottfried
%T Choosing Shape Features by means of Genetic Algorithms for Gylph-clustering of Historical Documents
%J International Journal of Computer Applications
%@ 0975-8887
%V 102
%N 3
%P 1-6
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The solution for a feature selection problem is presented in the field of document image processing. The choice of shape features for describing glyphs of historical documents is a non-trivial task since the variations of glyphs in different documents is innumerable. Hence, the manual selection of shape features would be a cumbersome task. To select a subset of features from a given set a genetic algorithm is used which optimises the result of a clustering process by x-means. The result of x-means is evaluated by using different quality measures. The optimisation methodology is illustrated within a case study, in which the selection of an appropriate set of features is a crucial part of the system. The intended application supports a user who is transcribing historical documents by showing him similar occurrences of a given glyph.

References
  1. Die Grenzboten, 28. Jahrgang, 2. Semester 1. Band, 1869. Scan 27 von der Staats- und Universit¨atsbibliothek Bremen.
  2. SBPK Berlin, Philllipps 1870, fol. 11v, 1870.
  3. W. Burger and M. J. Burge. Principles of digital image processing: Core algorithms. Springer, London, 2009.
  4. D. Goldberg and K. Deb. A comparative analysis of selection schemes used in genetic algorithms. In G. Rawlins, editor, Foundations of Genetic Algorithms, pages 69–93. Morgan- Kaufmann, 1991.
  5. R. C. Gonzalez and R. E. Woods. Digital image processing. Addison-Wesley, Reading, Mass. , [3. ed. ] reprint. with corr. edition, 1992.
  6. B. Gottfried. Qualitative similarity measures - the case of two-dimensional outlines. Computer Vision and Image Understanding, 110(1):117–133, 2008.
  7. B. Gottfried. Representing Material Objects by Qualitative Spatial Representations. Universit¨at Bremen, 2008. Unpublished Habilitation.
  8. B. Gottfried, A. Schuldt, and O. Herzog. Extent, extremum, and curvature: Qualitative numeric features for efficient shape retrieval. In Joachim Hertzberg, Michael Beetz, and Roman Englert, editors, KI 2007: Advances in Artificial Intelligence, volume 4667 of Lecture Notes in Computer Science, pages 308–322. Springer Berlin / Heidelberg, 2007.
  9. T. K. Ho. Random decision forests. In Proceedings of the second International Conference on Document Analysis and Recognition, pages 278–282, 1995.
  10. T. K. Ho and H. S. Baird. Perfect metrics. In Proceedings of the second International Conference on Document Analysis and Recognition, pages 593–597, 1993.
  11. T. K. Ho and H. S. Baird. Large-scale simulation studies in image pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(10):1067–1079, 1997.
  12. J. Holland. Adaption in Natural and Artificial Systems. University of Michigan Press, 1975.
  13. M. -K. Hu. Visual pattern recognition by moment invariants. Information Theory, IRE Transactions on, 8(2):179–187, 1962.
  14. J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. 5th Berkeley Symp. , volume 1, pages 281–297, 1967.
  15. P. Merz. Memetic Algorithms for Combinatorial Optimization Problems. Dissertation, Universit¨at-Gesamthochschule Siegen, 2000.
  16. S. Mori, C. Y. Suen, and K. Yamamoto. Historical review of ocr research and development. In Proceedings of the IEEE, volume 80, pages 1029–1058, July 1992.
  17. D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In Proc. 17th Int. Conf. Machine Learning, pages 727–734, 2000.
  18. T. H. Reiss. The revised fundamental theorem of moment invariants. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8):830–834, August 1991.
  19. A. Schuldt, B. Gottfried, and O. Herzog. Towards the visualisation of shape features the scope histogram. In C. Freksa, M. Kohlhase, and K. Schill, editors, KI 2006: Advances in Artificial Intelligence, volume 4314 of Lecture Notes in Computer Science, pages 289–301. Springer Berlin / Heidelberg, 2007.
  20. G. Vamvakas, B. Gatos, and S. J. Perantonis. A novel feature extraction and classification methodology for the recognition of historical documents. In 10th International Conference on Document Analysis and Recognition, pages 491–495, 2009.
  21. J. -H. Worch. VaBene – Validierung eines Benchmarks zur Evaluation von Formmerkmalen f¨ur Glyphen. Diploma thesis, Universit¨at Bremen, September 2011.
  22. J. -H. Worch, M. Lawo, and B. Gottfried. Glyph spotting for mediaeval handwritings by template matching. In Proceedings of the 12th ACM symposium on Document engineering, DocEng '12, New York, NY, USA, 2012. ACM.
  23. R. Xu and O. A. Di Guida. Comparison of sizing small particles using different technologies. Powder Technology, 132(2- 3):145 – 153, 2003.
Index Terms

Computer Science
Information Sciences

Keywords

Document Image Processing Genetic Algorithms Feature Selection Shape Descriptions Glyph Clustering X-Means