Call for Paper - November 2020 Edition
IJCA solicits original research papers for the November 2020 Edition. Last date of manuscript submission is October 20, 2020. Read More

A First Step Towards the Development of Yoruba Named Entity Recognition System

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2019
Ikechukwu I. Ayogu, Adebayo O. Adetunmbi, Bosede A. Ayogu

Ikechukwu I Ayogu, Adebayo O Adetunmbi and Bosede A Ayogu. A First Step Towards the Development of Yoruba Named Entity Recognition System. International Journal of Computer Applications 182(41):1-4, February 2019. BibTeX

	author = {Ikechukwu I. Ayogu and Adebayo O. Adetunmbi and Bosede A. Ayogu},
	title = {A First Step Towards the Development of Yoruba Named Entity Recognition System},
	journal = {International Journal of Computer Applications},
	issue_date = {February 2019},
	volume = {182},
	number = {41},
	month = {Feb},
	year = {2019},
	issn = {0975-8887},
	pages = {1-4},
	numpages = {4},
	url = {},
	doi = {10.5120/ijca2019918465},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


The NER task can be considered solved for English and a few other European languages given the available research outputs, tools, resources and applications involving NER for these languages. The scenario is sharply different for Nigerian and most of African languages and hence the motivation for the research reported in this paper. The paper presents an exploration of the potency of some language independent features in the recognition of the mentions of persons, locations and organizations in Yor`ub´a text in a supervised machine learning set-up. The results are promising but as further investigations revealed, the size of the training corpus is yet an issue that needs to be addressed.


  1. A. Tkachenko, T. Petmanson, and S. Laur. Named entity recognition in estonian. In Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, pages 78–83, 2013.
  2. B. Mohit. Named entity recognition. In Natural language processing of semitic languages, pages 221–245. Springer, 2014.
  3. Y. Benajiba, P. Rosso, and J. M. Bened´iruiz. Anersys: An arabic named entity recognition system based on maximum entropy. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 143–153. Springer, 2007.
  4. M. Marrero, J. Urbano, S. S´anchez-Cuadrado, J. Morato, and J. M. G´omez-Berb´is. Named entity recognition: fallacies, challenges and opportunities. Computer Standards & Interfaces, 35(5):482–489, 2013.
  5. I. Augenstein, L. Derczynski, and K. Bontcheva. Generalisation in named entity recognition: A quantitative analysis. Computer Speech & Language, 44:61–83, 2017.
  6. J. C. S. Alvarado, K. Verspoor, and T. Baldwin. Domain adaption of named entity recognition to support credit risk assessment. In Proceedings of the Australasian Language Technology Association Workshop 2015, pages 84–90, 2015.
  7. L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 147–155. Association for Computational Linguistics, 2009.
  8. A. Das and U. Garain. Crf-based named entity recognition @ icon 2013. arXiv preprint arXiv:1409.8008, 2014.
  9. G. Prasad, K. K. Fousiya, M. A. Kumar, and K. P. Soman. Named entity recognition for malayalam language: A crf based approach. In Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), 2015 International Conference on, pages 16–19. IEEE, 2015.
  10. A. Abdul-Hamid and K. Darwish. Simplified feature set for arabic named entity recognition. In Proceedings of the 2010 Named Entities Workshop, pages 110–115. Association for Computational Linguistics, 2010.
  11. W. Chen, Y. Zhang, and H. Isahara. Chinese named entity recognition with conditional random fields. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pages 118–121, 2006.
  12. K. U. Senevirathne, N. S. Attanayake, A. W. M. H. Dhananjanie, W. A. S. U. Weragoda, A. Nugaliyadde, and S. Thelijjagoda. Conditional random fields based named entity recognition for sinhala. In Industrial and Information Systems (ICIIS), 2015 IEEE 10th International Conference on, pages 302–307. IEEE, 2015.
  13. M. Tkachenko and A. Simanovsky. Named entity recognition: Exploring features. In KONVENS, pages 118–127, 2012.
  14. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd annual meeting on association for computational linguistics, pages 363–370. Association for Computational Linguistics, 2005.
  15. J. Kazama and K. Torisawa. Exploiting wikipedia as external knowledge for named entity recognition. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2007.
  16. W. Radford, X. Carreras, and J. Henderson. Named entity recognition with document-specific kb tag gazetteers. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 512–517, 2015.
  17. D. Seyler, T. Dembelova, L. Del Corro, J. Hoffart, and G. Weikum. A study of the importance of external knowledge in the named entity recognition task. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 241–246, 2018.
  18. J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001.
  19. E. F. Tjong Kim Sang and F. De Meulder. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 142–147. Association for Computational Linguistics, 2003.
  20. N. Chinchor and P. Robinson. Muc-7 named entity task definition. In Proceedings of the 7th Conference on Message Understanding, volume 29, 1997.
  21. C. Walker, S. Strassel, J. Medero, and K. Maeda. Ace 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia, 57, 2006.
  22. I. I. Ayogu, A. O. Adetunmbi, B. A. Ojokoh, and S. A. Oluwadare. A comparative study of hidden markov model and conditional random fields on a yor`uba part-of-speech tagging task. In Computing Networking and Informatics (ICCNI), 2017 International Conference on, pages 1–6. IEEE, 2017.


Named entities, NER, Yoruba language, Natural language processing