CFP last date
20 May 2024
Reseach Article

An Approach for Predicting Related Word for the Hindi Language

by Monika Sharma, Dinesh Gopalani, Meenakshi Tripathi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 123 - Number 6
Year of Publication: 2015
Authors: Monika Sharma, Dinesh Gopalani, Meenakshi Tripathi
10.5120/ijca2015905367

Monika Sharma, Dinesh Gopalani, Meenakshi Tripathi . An Approach for Predicting Related Word for the Hindi Language. International Journal of Computer Applications. 123, 6 ( August 2015), 29-34. DOI=10.5120/ijca2015905367

@article{ 10.5120/ijca2015905367,
author = { Monika Sharma, Dinesh Gopalani, Meenakshi Tripathi },
title = { An Approach for Predicting Related Word for the Hindi Language },
journal = { International Journal of Computer Applications },
issue_date = { August 2015 },
volume = { 123 },
number = { 6 },
month = { August },
year = { 2015 },
issn = { 0975-8887 },
pages = { 29-34 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume123/number6/21965-2015905367/ },
doi = { 10.5120/ijca2015905367 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:11:57.793033+05:30
%A Monika Sharma
%A Dinesh Gopalani
%A Meenakshi Tripathi
%T An Approach for Predicting Related Word for the Hindi Language
%J International Journal of Computer Applications
%@ 0975-8887
%V 123
%N 6
%P 29-34
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Without motivation, writing may be a cumbersome process. In this work, a methodology is proposed which will assist user by providing some reference information e.g. related words while composing an article or message. Smart systems with related word prediction have turned out to be extremely prevalent for English language but there is no such big efforts for Hindi language. The main goal of this dissertation work is to provide syntactically and semantically related words based on continuous feature vector representation. Continuous Bag of Words (CBOW) language model is used to get the feature vector representation of each word in training set. Cosine Distance and rule based strategy is used as measurement to find the most related word in context. In a comparative study we reasoned that our method excels in accuracy estimation than existing method. This approach will help Hindi writing in an effective and creative manner.

References
  1. S. M. Katz. 1987 Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, pp. 400–401.
  2. S. C. Douglas. 1998. Evaluation metrics for language models.
  3. S. Bengio and Y. Bengio. 2000. Taking on the curse of dimensionality in joint distributions using neural networks. Trans. Neur. Netw., vol. 11, no. 3, pp. 550–557.
  4. Y. Bengio. 2002. New distributed probabilistic language models. No. 1215.
  5. Bengio, Yoshua, Ducharme, R´ejean, P. Vincent, Janvin, and Christian, “A neural probabilistic language model,” J. Mach. Learn. Res., vol. 3, pp. 1137–1155, mar 2003.
  6. F. Morin and Y. Bengio, “Hierarchical probabilistic neural network language model,” pp. 246–252, 2005.
  7. L. van der Plas and J. Tiedemann, “Finding synonyms using automatic word alignment and measures of distributional similarity,” pp. 866–873, 2006.
  8. R. Nadig, J. Ramanand, and P. Bhattacharyya, “Automatic evaluation of wordnet synonyms and hypernyms,” Proceedings of ICON-2008: 6th International Conference on Natural Language Processing., pp. 8–31, 2008.
  9. R. M. K. Sinha, “A journey from indian scripts processing to indian language processing,” IEEE Annals of the History of Computing, vol. 31, no. 1, pp. 8–31, 2009.
  10. T. Mikolov, M. Karafi´at, L. Burget, J. Cernock´y, and S. Khudanpur, “Recurrent neural network based language model,” pp. 1045–1048, 2010.
  11. J. Turian, D. D. Et, R. O. (diro, U. D. Montral, L. Ratinov, and Y. Bengio, “Word representations: A simple and general method for semisupervised learning,” pp. 384-394, 2010.
  12. S. Reddy and S. Sharoff, “Cross language pos taggers (and other tools) for indian languages: An experiment with kannada using telugu resources,” November 2011.
  13. E. Arisoy, T. N. Sainath, B. Kingsbury, and B. Ramabhadran, “Deep neural network language models,” pp. 20–28, June 2012.
  14. N. Garg, V. Goyal, and S. Preet, “Rule based hindi part of speech tagger,” COLING (Demos), no. 163–174, 2012.
  15. N. Pappas and T. Meyer, “A survey on language modelling using neural networks,” no. Idiap-RR-32-2012, 2012.
  16. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,”CoRR, vol. abs/1301.3781, Oct 2013.
  17. A. Das., “Antarym: The smart keyboard for indian languages,”In the Workshop on Techniques on Basic Tool Creation and Its Applications (TBTCIA 2013), ICON, no.1215, 2013.
  18. T. Mikolov, W. tau Yih, and G. Zweig, “Linguistic regularities in continuous space word representations,” Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013), 2013.
  19. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” pp. 3111–3119, 2013.
  20. Q. V. Le and T. Mikolov, “Distributed representations of sentences and documents,” pp. 1188–1196, 2014.
  21. S. Sachdeva and B. Kastore, “Document clustering: Similarity measures,” Project Report, IIT Kanpur, 2014.
  22. L. Qiu, Y. Cao, Z. Nie, Y. Yu, and Y. Rui, “Learning word representation considering proximity and ambiguity,” AAAI Conference on Artificial Intelligence., June 2014.
  23. W. D. Mulder, S. Bethard, and M.-F. Moens, “A survey on the application of recurrent neural networks to statistical language modeling,” Computer Speech Language, vol. 30, no. 1, pp. 61-98, 2015.
  24. D. Guthrie, B. Allison, W. Liu, L. Guthrie, and Y. Wilks, “A closer look at skip-gram modelling.”
  25. Polyglot - Rami Al-Rfou - Google Sites “https://sites.google.com/site/rmyeid/projects/polyglottocdownload-wikipedia-text-dumps.”
  26. Quillpad: http://www.quillpad.in/index.html.
  27. Fleksy Keyboard: http://fleksy.com/.
  28. Hindi WordNet (A Lexical Database for Hindi): http://www.cfilt.iitb.ac.in/wordnet/webhwn/.
  29. Universal approximation theorem: https://en.wikipedia.org/wiki/universalapproximationtheorem.
  30. Neural net language models: http://www.scholarpedia.org/article/neural net language models.
  31. SwiftKey: http://swiftkey.com/en/.
  32. Polyglot - Rami Al-Rfou - Google Sites “https://sites.google.com/site/rmyeid/projects/polyglottocdownload-wikipedia-text-dumps.”
Index Terms

Computer Science
Information Sciences

Keywords

Language Modelling Curse of Dimensionality Distributed Representation CBOW POS tagging.