CFP last date
22 April 2024
Reseach Article

Unsupervised Text Classification and Search using Word Embeddings on a Self-Organizing Map

by Suraj Subramanian, Deepali Vora
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 156 - Number 11
Year of Publication: 2016
Authors: Suraj Subramanian, Deepali Vora
10.5120/ijca2016912570

Suraj Subramanian, Deepali Vora . Unsupervised Text Classification and Search using Word Embeddings on a Self-Organizing Map. International Journal of Computer Applications. 156, 11 ( Dec 2016), 35-37. DOI=10.5120/ijca2016912570

@article{ 10.5120/ijca2016912570,
author = { Suraj Subramanian, Deepali Vora },
title = { Unsupervised Text Classification and Search using Word Embeddings on a Self-Organizing Map },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2016 },
volume = { 156 },
number = { 11 },
month = { Dec },
year = { 2016 },
issn = { 0975-8887 },
pages = { 35-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume156/number11/26756-2016912570/ },
doi = { 10.5120/ijca2016912570 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:02:22.760884+05:30
%A Suraj Subramanian
%A Deepali Vora
%T Unsupervised Text Classification and Search using Word Embeddings on a Self-Organizing Map
%J International Journal of Computer Applications
%@ 0975-8887
%V 156
%N 11
%P 35-37
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper presents the results of an experimental implementation of a document classifier leveraging contextual word embeddings clustered on a self-organizing map. The problem of document categorization is further compounded when there are no predefined categories, or conversely there are too many categories, that documents may be bucketed into. This paper proposes to address these problems by modelling the major themes contained in the document corpus into a cluster-map using a self-organizing neural network. The cluster-map provides a visual representation to explore the corpus, and a near-semantic search interface of the many concepts outlined across the corpus.

References
  1. Honkela, T., Kaski, S., Lagus, K. and Kohonen, T., “Newsgroup exploration with WEBSOM method and browsing interface, ” Technical report, vol. 32, 1996.
  2. Kohonen, T., “Self-organization of very large document collections: State of the art,” Springer London ICANN 98, pp. 65-74, 1998.
  3. Kaski, S., Honkela, T., Lagus, K. and Kohonen, T., “WEBSOM–self-organizing maps of document collections,” Neurocomputing 21(1), pp.101-117, 1998
  4. Ritter, H. and Kohonen, T., “Self-organizing semantic maps,” Biological Cybernetics, 61(4), pp.241-254, 1989.
  5. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J., “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems (pp. 3111-3119), 2013.
  6. Pennington, J., Socher, R. and Manning, C.D., “Glove: Global Vectors for Word Representation,” EMNLP, vol. 14, pp. 1532-43, October 2014.
  7. Lin, X., Soergel, D. and Marchionini, G., “A self-organizing semantic map for information retrieval,” Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 262-269, September 1991.
  8. Martin, F. and Johnson, M., “More Efficient Topic Modelling Through a Noun Only Approach,” Australasian Language Technology Association Workshop, pp. 111, 2015.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering knowledge retrieval natural language processing neural nets self organizing map topic modelling semantic search unsupervised.