Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network

Himanshu Suyal; R B Patel

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2026

Submit your paper

Know more

The week's pick

AI-Assisted Observability in Distributed Microservice Architectures

Kyrylo Sotnykov

Random Articles

An Evaluation of Network Topologies for Enhance Networking

Jun

2023

Semantic Web Application in Learning Resource Ontology Repository

April

2016

FRANSAC: Fast RANdom Sample Consensus for 3D Plane Segmentation

Jun

2017

Recommender Systems for Software Requirements Negotiation and Prioritization

May

2015

Reseach Article

Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network

by Himanshu Suyal, R B Patel

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 94 - Number 18

Year of Publication: 2014

Authors: Himanshu Suyal, R B Patel

10.5120/16463-6194

Himanshu Suyal, R B Patel . Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network. International Journal of Computer Applications. 94, 18 ( May 2014), 42-46. DOI=10.5120/16463-6194

@article{ 10.5120/16463-6194,

author = { Himanshu Suyal, R B Patel },

title = { Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network },

journal = { International Journal of Computer Applications },

issue_date = { May 2014 },

volume = { 94 },

number = { 18 },

month = { May },

year = { 2014 },

issn = { 0975-8887 },

pages = { 42-46 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume94/number18/16463-6194/ },

doi = { 10.5120/16463-6194 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:18:17.346863+05:30

%A Himanshu Suyal

%A R B Patel

%T Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network

%J International Journal of Computer Applications

%@ 0975-8887

%V 94

%N 18

%P 42-46

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In Micro-blogging web services such as Twitter, the user is often bombarded with tons of information and raw data, with user unable to classify it into right category. The solution to overcome this problem can be derived from automatic text classification process. Social networking websites often limit their users to put up a short text message of length 140 characters only. Hence classifying this raw data continuously on these microblogging websites is a tedious task, as one has to deal with short text. Short text messages are difficult to classify as they have lack of semantic information and they have high risk of getting misclassified. In this research paper, a methodology has been developed that incorporates preparation of semantic database and then employ it to extract the necessary classification features from the database. This prepared database is then used for binary feature extraction from the set of user tweeted database hence the process of extracting features from the available database based on the semantic database approach has been presented. The basic of this paper is mainly focused on extracting nine features and then reducing the features to seven features using logical operations. The process of reducing the features not only reduces the complexity of the written code but also saves the database memory required to save the extracted feature for master training database. The features so extracted are easier to use and operation has less complexity of generation than compared to features generated by other available algorithms like Bag-of-Words.

References

www. twitter. com
X. -H. Phan, L. -M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, WWW '08, pages 91{100. ACM, 2008.
N. Cohen. Twitteronthebarricades:Sixlesson,learned. http://www. nytimes. com/2009/06/21/weekinreview/21cohenw b. html, Pub. June 20, 2009
http://www. time. com/time/magazine/article/0, 9171, 1044658, 00. html
A. Java X. Song, T. Finin, and B. Tseng, 2007. Why we twitter: understanding microblogging usage and communities. In Process WebKDD/SNA-KDD '07 (San Jose, California, August, 2007), 56-65.
X. -H. Phan, L. -M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, WWW '08, pages 91-100. ACM, 2008.
Mehran Sahami , Timothy D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. Proceedings of the 15th international conference on World Wide Web, 2006.
D Bollegala, Y Matsuo, M Ishizuka. Measuring semantic similarity between words using web search engines. Proceedings of the 16th international conference on World Wide Web, 2007.
Ou Jin, Nathan N. Liu, Kai Zhao , Yong Yu , Qiang Yang. Transferring topical knowledge from auxiliary long texts for short text clustering. Proceedings of the 20th ACM international conference on Information and knowledge management, 2011.
Mengen Chen, Xiaoming Jin, Dou Shen. Short text classification improved by learning multi-granularity topics. Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, p. 1776-1781, 2011.
Sankaranarayanan, J. , Samet, H. , Teitler, B. E. , Lieberman, and M. D. ,Sperling, J. TwitterStand: news in tweets. In Proc. ACM GIS'09(Seattle, Washington, Nov. 2009), 42-51.
Yue Lu, Qiaozhu Mei , Chengxiang Zhai. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA, Information Retrieval, v. 14 n. 2, p. 178-203, 2011.
Q. Diao, J. Jiang, F. Zhu, and E. -P. Lim. Finding bursty topics from microblogs. in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Volume 1, p. 536C544. 2012.
Yang, Lili, et al. "Combining Lexical and Semantic Features for Short Text Classification. " Procedia Computer Science 22 (2013): 78-86.
M. Milian. Twitter sees earth shaking activity during So Caquake. http://latimesblogs. latimes. com/technology /2008-07/twitter-earthqu. html,Pub. July 30, 2008
http://en. wikipedia. orgwiki/Micro-blogging

Index Terms

Computer Science

Information Sciences

Keywords

Text classification short text Twitter semantic Bag-of-Words