Call for Paper - November 2022 Edition
IJCA solicits original research papers for the November 2022 Edition. Last date of manuscript submission is October 20, 2022. Read More

VSM Based Classification of Data Objects with Individual Treatment of Continuous and Discrete Attributes

Print
PDF
IJCA Proceedings on International Conference and workshop on Emerging Trends in Technology (ICWET 2012)
© 2012 by IJCA Journal
icwet2012 - Number 12
Year of Publication: 2012
Authors:
Komal Kumar Bhatia
Atul Srivastava
Veena Garg

Komal Kumar Bhatia, Atul Srivastava and Veena Garg. Article: VSM Based Classification of Data Objects with Individual Treatment of Continuous and Discrete Attributes. IJCA Proceedings on International Conference and workshop on Emerging Trends in Technology (ICWET 2012) icwet(12):37-41, March 2012. Full text available. BibTeX

@article{key:article,
	author = {Komal Kumar Bhatia and Atul Srivastava and Veena Garg},
	title = {Article: VSM Based Classification of Data Objects with Individual Treatment of Continuous and Discrete Attributes},
	journal = {IJCA Proceedings on International Conference and workshop on Emerging Trends in Technology (ICWET 2012)},
	year = {2012},
	volume = {icwet},
	number = {12},
	pages = {37-41},
	month = {March},
	note = {Full text available}
}

Abstract

Classification is a technique, used in data mining, for identification of membership of a particular data object. In this paper we provide a technique of classification that is an enhancement of an existing method of information retrieval i.e. Vector Space Model. Vector space model is applied on text data and generally used to determine the relevance of query to the web pages in information retrieval. Data objects are categorized in two communities based on their attributes, one having discrete-valued attributes and second having continuous-valued attributes. In almost every previous attempt in this area has treated both of the communities of data objects separately. For scalability point of view of the classifier one type (discrete/continuous) is converted to the other (continuous/discrete).This conversion sometimes may hamper the accuracy. But in this paper continuous and discrete attributes are treated individually without tempering their representation. This paper emulates VSM to be used for classification in the same way it is used for determining query relevance in information retrieval. The results show that the enhanced model achieved very good results in performance and the setup time is also satisfactory for a large collection of data objects. This paper is organized as section 1 contains the basic terminology about classification and introduction of vector space model, section 2 contains the related work that has already been done in literature, section 3 contains model construction for classification i.e. simulation of existing vector space model for information retrieval and use of this model for classification of unseen data tuple, section 4 contains pseudo code for VSM classification. Section 5 shows experiment and results analysis through an example. Section 6 concludes the paper and throws light on future aspects.

References

  • Van Rijsbergen, Keith,”Information Retrieval”, Butterworths London, 1979.
  • M.J. Xavier, Sundaramurthy, P.K. Viswanathan, G. Balasubramanian ,”Improving prediction accuracy of loan default- A case in rural credit”.
  • “Vector space model –Wikipedia “, http://en.wikipedia.org/wiki/Vector_space_model
  • “Scoring, Term Weighting and the Vector Space Model”, www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt.
  • “Statistical classification (machine learning)”, http://en.wikipedia.org/wiki/Classification_(machine_learning).
  • Thair Nu Phyu , Survey of Classification Techniques in Data Mining,2009.
  • AtulSrivastava,VeenaGarg,”An Adaptation of Vector Space Model for Classification of Continuous data objects”, 2011.