CFP last date
22 April 2024
Reseach Article

VSM Based Classification of Data Objects with Individual Treatment of Continuous and Discrete Attributes

Published on March 2012 by Komal Kumar Bhatia, Atul Srivastava, Veena Garg
International Conference and Workshop on Emerging Trends in Technology
Foundation of Computer Science USA
ICWET2012 - Number 12
March 2012
Authors: Komal Kumar Bhatia, Atul Srivastava, Veena Garg
454f1525-0238-4da9-a067-cd092402c247

Komal Kumar Bhatia, Atul Srivastava, Veena Garg . VSM Based Classification of Data Objects with Individual Treatment of Continuous and Discrete Attributes. International Conference and Workshop on Emerging Trends in Technology. ICWET2012, 12 (March 2012), 37-41.

@article{
author = { Komal Kumar Bhatia, Atul Srivastava, Veena Garg },
title = { VSM Based Classification of Data Objects with Individual Treatment of Continuous and Discrete Attributes },
journal = { International Conference and Workshop on Emerging Trends in Technology },
issue_date = { March 2012 },
volume = { ICWET2012 },
number = { 12 },
month = { March },
year = { 2012 },
issn = 0975-8887,
pages = { 37-41 },
numpages = 5,
url = { /proceedings/icwet2012/number12/5405-1096/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference and Workshop on Emerging Trends in Technology
%A Komal Kumar Bhatia
%A Atul Srivastava
%A Veena Garg
%T VSM Based Classification of Data Objects with Individual Treatment of Continuous and Discrete Attributes
%J International Conference and Workshop on Emerging Trends in Technology
%@ 0975-8887
%V ICWET2012
%N 12
%P 37-41
%D 2012
%I International Journal of Computer Applications
Abstract

Classification is a technique, used in data mining, for identification of membership of a particular data object. In this paper we provide a technique of classification that is an enhancement of an existing method of information retrieval i.e. Vector Space Model. Vector space model is applied on text data and generally used to determine the relevance of query to the web pages in information retrieval. Data objects are categorized in two communities based on their attributes, one having discrete-valued attributes and second having continuous-valued attributes. In almost every previous attempt in this area has treated both of the communities of data objects separately. For scalability point of view of the classifier one type (discrete/continuous) is converted to the other (continuous/discrete).This conversion sometimes may hamper the accuracy. But in this paper continuous and discrete attributes are treated individually without tempering their representation. This paper emulates VSM to be used for classification in the same way it is used for determining query relevance in information retrieval. The results show that the enhanced model achieved very good results in performance and the setup time is also satisfactory for a large collection of data objects. This paper is organized as section 1 contains the basic terminology about classification and introduction of vector space model, section 2 contains the related work that has already been done in literature, section 3 contains model construction for classification i.e. simulation of existing vector space model for information retrieval and use of this model for classification of unseen data tuple, section 4 contains pseudo code for VSM classification. Section 5 shows experiment and results analysis through an example. Section 6 concludes the paper and throws light on future aspects.

References
  1. Van Rijsbergen, Keith,”Information Retrieval”, Butterworths London, 1979.
  2. M.J. Xavier, Sundaramurthy, P.K. Viswanathan, G. Balasubramanian ,”Improving prediction accuracy of loan default- A case in rural credit”.
  3. “Vector space model –Wikipedia “, http://en.wikipedia.org/wiki/Vector_space_model
  4. “Scoring, Term Weighting and the Vector Space Model”, www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt.
  5. “Statistical classification (machine learning)”, http://en.wikipedia.org/wiki/Classification_(machine_learning).
  6. Thair Nu Phyu , Survey of Classification Techniques in Data Mining,2009.
  7. AtulSrivastava,VeenaGarg,”An Adaptation of Vector Space Model for Classification of Continuous data objects”, 2011.
Index Terms

Computer Science
Information Sciences

Keywords

Information retrieval Vector space Model Classification Continuous attributes Discrete attributes Classification technique