CFP last date
20 May 2024
Reseach Article

Document Clustering using Learning from Examples

by G. Thavasi Raja, R. Malmathanraj, M. Arun
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 39 - Number 12
Year of Publication: 2012
Authors: G. Thavasi Raja, R. Malmathanraj, M. Arun
10.5120/4872-7299

G. Thavasi Raja, R. Malmathanraj, M. Arun . Document Clustering using Learning from Examples. International Journal of Computer Applications. 39, 12 ( February 2012), 17-24. DOI=10.5120/4872-7299

@article{ 10.5120/4872-7299,
author = { G. Thavasi Raja, R. Malmathanraj, M. Arun },
title = { Document Clustering using Learning from Examples },
journal = { International Journal of Computer Applications },
issue_date = { February 2012 },
volume = { 39 },
number = { 12 },
month = { February },
year = { 2012 },
issn = { 0975-8887 },
pages = { 17-24 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume39/number12/4872-7299/ },
doi = { 10.5120/4872-7299 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:26:17.484836+05:30
%A G. Thavasi Raja
%A R. Malmathanraj
%A M. Arun
%T Document Clustering using Learning from Examples
%J International Journal of Computer Applications
%@ 0975-8887
%V 39
%N 12
%P 17-24
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Information filtering (IF) systems usually filter data items by correlating a set of terms representing the user’s interest with similar sets of terms representing the data items. Many techniques have been employed for constructing user profiles automatically, but they usually yield large sets of data. Various dimensionality-reduction techniques can be applied in order to reduce the number of terms in a user query. A new framework is described to classify large scale documents and retrieve the documents related to the user’s query based on the application of trained artificial neural network (ANN) model. Its novel feature is the identification of an optimal set of documents that are relevant to the user. As a case study the government orders issued by Tamil Nadu state government, a state in India are classified according to their semantic similarity. Various neural architectures such as back propagation neural network (BPN), radial basis function (RBF), Learning Vector Quantization (LVQ) and Support vector machines (SVM) are used and their performance evaluation is analyzed.

References
  1. Miller, K.-R., Mika, S., Ratsch, G., Tsuda, K. and Scholkopf, B. 2001. An Introduction to Kernel-Based Learning Algorithms. IEEE Transactions on neural Networks, Vol 12, No. 2.
  2. Kohonen, T. 1982. Self-Organized formation of topologically correct feature maps. Biological Cybernetics, Vol 43, 59–69.
  3. Kohonen, T., Kaski, S., Lagus, K. and Honkela, T. 1996. Very Large Two-Level SOM for Browsing of Newsgroups”. In Proc. of ICANN’96 International Conference on Artificial Neural Networks, 269-274.
  4. Kohonen, T. 1982. Self-Organized formation of topologically correct feature maps. Biological cybernetics, 43:59-69.
  5. Kumar, V. S., McCalla, G.I. and Greer, J. E. 1999. Helping the peer helper. In Proceedings of the International Conference on AI in Education, 325–332.
  6. Landauer, T. K. and Dumais, S. T. 1997. A solution to Plato’s problem: The Latent Semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104:211-240.
  7. Malmathanraj, R., Thamarai Selvi, S. and Mahendran, E. 2006. Prediction of Aerodynamics Characteristics Using Neural Network. In Proc of NCAC 06 [National Conference on Advanced Computing], MIT, Anna University, Chennai, ISBN: 81-7764-994-9.
  8. Del-Brio, M. B. and Serrano-Cinca, C. 1995. Self-Organizing Neural Networks: The Financial state of Spanish Companies. In A.Refenes, editor, Neural Networks in the Capital Markets. John Wiley and sons, New York
  9. Del-Brio, M. B. and Serrano-Cinca, C. 1993. Self-Organizing Neural Networks for the Analysis and Representation of Data: Some Financial Cases. Neural Computing and Applications, 1(3):193-206.
  10. M.N. Do, Vetterli, M. 2002. Wavelet based texture retrieval using Generalized Gaussian Density and Kullback-Leibler Distance, IEEE Transactions on Image Processing, Vol 11, No 2.
  11. Magnussen, R. and Misfeldt, M. 2004. Player transformation of educational multiplayer games. In Jonas Heide Smith and Miguel Sicart, editors. Proceedings of the Other Players Conference, Copenhagen, Denmark, IT University of Copenhagen.
  12. Scholtes J. C. 1991. Unsupervised Learning and the information retrieval problem. In Proc. of IJCNN’91, Int.Joint Conference on Neural Networks, Volume I, 95-100,.
  13. Ultsch A. 1992. Knowledge Acquisition with Self-Organizing Neural Networks. In I.Aleksander and J.Taylor, editors, Artificial Neural Networks, 2, Volume I, Amsterdam, Netherlands, North-Holland, 735-738.
  14. Back, B., Toivonen, J., Vanharanta, H, Visa, A. 2001. Comparing numerical data and text information from annual reports using self-organizing maps. International Journal of Accounting Information Systems, Volume 2, Issue 4, 249–269
  15. Landauer, T. K., Laham, D., Render, R., and Schreiner, M. E. 1972. How well can Passage Meaning be derived without using word order? In A comparison of the 19th annual conference of the cognitive science society, Mahwah, NJ, 1997, Sparck Jones, 412–417.
  16. Marvin, S., and Scott, S. 1999. Feature engineering for text classification. In Proceedings of international conference on machine learning.
  17. Salton, G., A. Wong and C.S. Yang. 1975. A vector space model for automatic indexing. Communications of the ACM, 18 (11), 613–620
  18. Vapnik, V. 1995. The Nature of Statistical Learning Theory. New-York: Springer-Verlag.
  19. Scholkopf, B., K. Sung, Burges, C., Girosi, F., Niyogi, P., Poggio, T. and Vapnik, V. 1997. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE Trans. Sign. Processing, 45:2758 – 2765.
  20. Suykens, J. A. K. and Vandewalle, J. 1999. Least squares support vector machine classifiers. Neural Processing Letters, 9(3):293-300.
  21. Ari Visa, Jarmo Toivonen, Piia Ruokonen, Hannu Vanharanta, Barbro Back, (2000). ”Knowledge Discovery from Text Documents Based on Paragraph Maps”, Proceedings of the 33rd Hawaii International Conference on System Sciences.
Index Terms

Computer Science
Information Sciences

Keywords

Document clustering Artificial Neural Networks (ANN) Learning form examples