CFP last date
20 May 2024
Reseach Article

Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network

by Himanshu Suyal, R B Patel
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 94 - Number 18
Year of Publication: 2014
Authors: Himanshu Suyal, R B Patel
10.5120/16463-6194

Himanshu Suyal, R B Patel . Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network. International Journal of Computer Applications. 94, 18 ( May 2014), 42-46. DOI=10.5120/16463-6194

@article{ 10.5120/16463-6194,
author = { Himanshu Suyal, R B Patel },
title = { Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 94 },
number = { 18 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 42-46 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume94/number18/16463-6194/ },
doi = { 10.5120/16463-6194 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:18:17.346863+05:30
%A Himanshu Suyal
%A R B Patel
%T Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network
%J International Journal of Computer Applications
%@ 0975-8887
%V 94
%N 18
%P 42-46
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In Micro-blogging web services such as Twitter, the user is often bombarded with tons of information and raw data, with user unable to classify it into right category. The solution to overcome this problem can be derived from automatic text classification process. Social networking websites often limit their users to put up a short text message of length 140 characters only. Hence classifying this raw data continuously on these microblogging websites is a tedious task, as one has to deal with short text. Short text messages are difficult to classify as they have lack of semantic information and they have high risk of getting misclassified. In this research paper, a methodology has been developed that incorporates preparation of semantic database and then employ it to extract the necessary classification features from the database. This prepared database is then used for binary feature extraction from the set of user tweeted database hence the process of extracting features from the available database based on the semantic database approach has been presented. The basic of this paper is mainly focused on extracting nine features and then reducing the features to seven features using logical operations. The process of reducing the features not only reduces the complexity of the written code but also saves the database memory required to save the extracted feature for master training database. The features so extracted are easier to use and operation has less complexity of generation than compared to features generated by other available algorithms like Bag-of-Words.

References
  1. www. twitter. com
  2. X. -H. Phan, L. -M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, WWW '08, pages 91{100. ACM, 2008.
  3. N. Cohen. Twitteronthebarricades:Sixlesson,learned. http://www. nytimes. com/2009/06/21/weekinreview/21cohenw b. html, Pub. June 20, 2009
  4. http://www. time. com/time/magazine/article/0, 9171, 1044658, 00. html
  5. A. Java X. Song, T. Finin, and B. Tseng, 2007. Why we twitter: understanding microblogging usage and communities. In Process WebKDD/SNA-KDD '07 (San Jose, California, August, 2007), 56-65.
  6. X. -H. Phan, L. -M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, WWW '08, pages 91-100. ACM, 2008.
  7. Mehran Sahami , Timothy D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. Proceedings of the 15th international conference on World Wide Web, 2006.
  8. D Bollegala, Y Matsuo, M Ishizuka. Measuring semantic similarity between words using web search engines. Proceedings of the 16th international conference on World Wide Web, 2007.
  9. Ou Jin, Nathan N. Liu, Kai Zhao , Yong Yu , Qiang Yang. Transferring topical knowledge from auxiliary long texts for short text clustering. Proceedings of the 20th ACM international conference on Information and knowledge management, 2011.
  10. Mengen Chen, Xiaoming Jin, Dou Shen. Short text classification improved by learning multi-granularity topics. Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, p. 1776-1781, 2011.
  11. Sankaranarayanan, J. , Samet, H. , Teitler, B. E. , Lieberman, and M. D. ,Sperling, J. TwitterStand: news in tweets. In Proc. ACM GIS'09(Seattle, Washington, Nov. 2009), 42-51.
  12. Yue Lu, Qiaozhu Mei , Chengxiang Zhai. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA, Information Retrieval, v. 14 n. 2, p. 178-203, 2011.
  13. Q. Diao, J. Jiang, F. Zhu, and E. -P. Lim. Finding bursty topics from microblogs. in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Volume 1, p. 536C544. 2012.
  14. Yang, Lili, et al. "Combining Lexical and Semantic Features for Short Text Classification. " Procedia Computer Science 22 (2013): 78-86.
  15. M. Milian. Twitter sees earth shaking activity during So Caquake. http://latimesblogs. latimes. com/technology /2008-07/twitter-earthqu. html,Pub. July 30, 2008
  16. http://en. wikipedia. orgwiki/Micro-blogging
Index Terms

Computer Science
Information Sciences

Keywords

Text classification short text Twitter semantic Bag-of-Words