CFP last date
22 April 2024
Reseach Article

Multilabel Classification of Tweets

by Abha Tewari, Pratik Sawant, Jai Samtani, Sanket Sawant, Gaurav Massand
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 159 - Number 1
Year of Publication: 2017
Authors: Abha Tewari, Pratik Sawant, Jai Samtani, Sanket Sawant, Gaurav Massand
10.5120/ijca2017912209

Abha Tewari, Pratik Sawant, Jai Samtani, Sanket Sawant, Gaurav Massand . Multilabel Classification of Tweets. International Journal of Computer Applications. 159, 1 ( Feb 2017), 1-4. DOI=10.5120/ijca2017912209

@article{ 10.5120/ijca2017912209,
author = { Abha Tewari, Pratik Sawant, Jai Samtani, Sanket Sawant, Gaurav Massand },
title = { Multilabel Classification of Tweets },
journal = { International Journal of Computer Applications },
issue_date = { Feb 2017 },
volume = { 159 },
number = { 1 },
month = { Feb },
year = { 2017 },
issn = { 0975-8887 },
pages = { 1-4 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume159/number1/26962-2017912209/ },
doi = { 10.5120/ijca2017912209 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:04:31.762664+05:30
%A Abha Tewari
%A Pratik Sawant
%A Jai Samtani
%A Sanket Sawant
%A Gaurav Massand
%T Multilabel Classification of Tweets
%J International Journal of Computer Applications
%@ 0975-8887
%V 159
%N 1
%P 1-4
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With the help of Social Networking sites many news providers used to share their news headlines on the micro blogging sites such as twitter. We are proposing a system to classify tweets into different groups and labels so that the user can identify the particular tweet from particular category. We will use 120 character tweets for our analysis purpose. Various active and verified twitter accounts would be chosen to extract the tweets. Each tweet is to be classified into 2 category-spam and non-spam. Then further spam group is classified as advertisement, malicious and URL links. The non-spam tweets are classified into 6 labels. These classified tweets then are used to train the various machine learning techniques. Words of each tweet considered as features and a feature vector was created using bag-of-words approach in order to create the instances. The data will be trained using SVM (Support Vector Machine), Naive Bayes and K neighbor machine learning techniques and their efficiency will be compared.

References
  1. ErsinYar,LemiBaruh, Syleyman S. Kozat 2016 Online Text Classification for Real Life Tweet Analysis
  2. P. Selvaperumal, A. Suruliandi 2014 A short message classification algorithm for tweet classification.
  3. InoshikaDilrukshi, Kasun De Zoysa 2014 Twitter news classification: Theoretical and practical comparison of SVM against Naive Bayes algorithm.
  4. Nitin Jindal, Bing Liu 2007 Analyzing and Detecting Review Spam
  5. Shankar Setty ,RajendraJadi , Sabya Shaikh , ChandanMattikalli , Uma Mudenagudi 2014 Classification of Facebook news feeds and sentiment analysis.
  6. KamalanathanKandasamy, PreethiKoroth 2014 An integrated approach to spam classification on Twitter using URL analysis, natural language processing and machine learning techniques.
  7. Support vector mechanism by David Meyer: The interface to libsvm in package e1071
  8. How to Get Started With Machine Learning Algorithms in R by Jason Brownlee: http://machinelearningmastery.com/how-to-get-started-with-machine-learning-algorithms-in-r/
  9. Machine learning course by Andrew Nig: https://www.coursera.org/learn/machine-learning
  10. Basic text mining in r: https://rstudio-pubs-staic.s3.amazonaws.com/31867_8236987cf0a8444e962ccd2aec46d9c3.html
Index Terms

Computer Science
Information Sciences

Keywords

SVM -Support Vector Mechanism NLP -Natural Language Processing NB-Naïve Bayes KNN-K Nearest Neighbor