Text Categorization using Distributional Features and Semantic Equivalence

Tirupathaiah Kommi; Srikanth Jatla

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

Quantifying Label-Induced Bias in Large Language Model Self and Cross Evaluations

Muskan Saraf Sajjad Rezvani Boroujeni Justin Beaudry Hossein Abedi Tom Bush

Random Articles

A Novel Adaptive Mobile E-Learning Model

February

2013

TweetSum: Automated News Summarization of Twitter Trends

May

2017

A Weighted Fair Queue based SBPN (WFQ-SBPN) Algorithm to Improve Qos for Multimedia Application in Mobile Ad Hoc Networks

February

2015

A Review Study of Information Systems

Feb

2018

Reseach Article

Text Categorization using Distributional Features and Semantic Equivalence

by Tirupathaiah Kommi, Srikanth Jatla

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 30 - Number 7

Year of Publication: 2011

Authors: Tirupathaiah Kommi, Srikanth Jatla

10.5120/3653-5105

Tirupathaiah Kommi, Srikanth Jatla . Text Categorization using Distributional Features and Semantic Equivalence. International Journal of Computer Applications. 30, 7 ( September 2011), 30-35. DOI=10.5120/3653-5105

@article{ 10.5120/3653-5105,

author = { Tirupathaiah Kommi, Srikanth Jatla },

title = { Text Categorization using Distributional Features and Semantic Equivalence },

journal = { International Journal of Computer Applications },

issue_date = { September 2011 },

volume = { 30 },

number = { 7 },

month = { September },

year = { 2011 },

issn = { 0975-8887 },

pages = { 30-35 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume30/number7/3653-5105/ },

doi = { 10.5120/3653-5105 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:16:56.943607+05:30

%A Tirupathaiah Kommi

%A Srikanth Jatla

%T Text Categorization using Distributional Features and Semantic Equivalence

%J International Journal of Computer Applications

%@ 0975-8887

%V 30

%N 7

%P 30-35

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In text mining domain, text categorization is widely used which is nothing but assigning predefined categories to text. The process of assigning values to words based on the occurrences of words known as bag-of-word approach was used by previous researchers in order to find how frequently a word is used in the document. This approach has a drawback as it does not consider other features of words except the count of it. This paper throws light into assigning other values to a word known as distributional features. This approach is novel and the distributional features include the position of first occurrence of word and compactness of its appearances. Our experimental results revealed that text categorization has been improved with the help of distributional features and semantic equivalence. The research has thrown light into another fact that distributional features are very useful when writing style is casual and document is long. The semantic equivalence used to extend equivalence rough set approach.

References

L.D.Bakerand A.K.McCallum, Distributional Clustering of Words for Text Classification, Proc. ACM SIGIR ’98, pp. 96-103, 1998.
R. Bekkerman, R El-Yaniv, N. Tishb, and Y.Winter Distributional Word Clusters versus Words for Text Categorization, J. Machine Learning Research, vol. 3, pp. 1182-1208, 03.
J.P. Callan, Passage Retrieval Evidence in Document Retrieval, Proc. ACM SIGIR ’94, pp. 302-310, 1994.
M.F. Caropreso, S. Matwin, and F.Sebastiani, A Learner- Independent Evaluation of the Usefulness of Statistica Phrases for Automated Text Categorization,Text Databases and Document Management Theory and Practice, A.G. Chin, ed., pp. 78-102, Idea Group Publishing, 2001.
F.Debole and F.Sebastiani, Supervised Term Weighting for Automated Text Categorization, Proc. 18th ACM Symp. Applied Computing (SAC ’03), pp. 784-788, 2003.
S.T. Dumais, J.C. Platt, D. Heckerman, and M. Sahami, Inductive Learning Algorithms and Representations for Text Categorization, Proc. Seventh Int’l Conf. Information and Knowledge Management (CIKM ’98), pp. 148-155, 1998.
C. Fellbaum, WordNet: An Electronic Lexical Database. MIT Press, 1998.
J. Kim and M.H. Kim, An Evaluation of Passage-Based Text Categorization, J. Intelligent Information Systems, vol. 23, no. 1, pp. 47-65, 2004.
K. Lang, Newsweeder: Learning to Filter Netnews Proc. 12th Int’l Conf. Machine Learning (ICML ’95), pp. 331-339, 1995.
E. Leopold and J. Kingermann, Text Categorization with Support Vector Machines: How to Represent Text in Input Space? Machine Learning, vol. 46, nos. 1-3, pp. 423-444, 2002.
R.E. Schapire and Y.Singer, Boostexter: A Boosting-Based System for Text Categorization, Machine Learning, vol. 39, nos. 2/3, pp.135-168, 2000.
F.Sebastiani, Machine Learning in Automated Text categorization, ACM Computing Surveys, vol. 34, no 1, pp. 1-47, 2002
S. Shankar and G.Karypis, A Feature Weight Adjustment Algorithm for Document Classification,Proc. SIGKDD’00 Workshop Text Mining, 2000.
P. Soucy and G.W. Mineau, Beyond tfidf Weighting for Text Categorization in the Vector Space Model, Proc.19thInt’l J Artificial Intelligence (IJCAI ’05), pp.1130-1135,2005
X.-B. Xue and Z.-H. Zhou, Distributional Features for Text Categorization, Proc.17th European Conf. Machine Learning (ICML ’06), pp. 497-508, 2006.
Y. Yang and J.O. Pedersen, A Comparative Study on Feature Selection in Text Categorization, Proc. 14th Int’l Conf. Machine Learning (ICML ’97), pp. 412-420, 1997.

Index Terms

Computer Science

Information Sciences

Keywords

Text mining machine learning text categorization distributional feature tfidf