CFP last date
22 April 2024
Reseach Article

Comparative Study of Web Spam Detection using Data Mining

by Chirag Nathwani, Viralkumar Prajapati, Deven Agravat
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 68 - Number 18
Year of Publication: 2013
Authors: Chirag Nathwani, Viralkumar Prajapati, Deven Agravat
10.5120/11680-6493

Chirag Nathwani, Viralkumar Prajapati, Deven Agravat . Comparative Study of Web Spam Detection using Data Mining. International Journal of Computer Applications. 68, 18 ( April 2013), 26-29. DOI=10.5120/11680-6493

@article{ 10.5120/11680-6493,
author = { Chirag Nathwani, Viralkumar Prajapati, Deven Agravat },
title = { Comparative Study of Web Spam Detection using Data Mining },
journal = { International Journal of Computer Applications },
issue_date = { April 2013 },
volume = { 68 },
number = { 18 },
month = { April },
year = { 2013 },
issn = { 0975-8887 },
pages = { 26-29 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume68/number18/11680-6493/ },
doi = { 10.5120/11680-6493 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:28:13.627348+05:30
%A Chirag Nathwani
%A Viralkumar Prajapati
%A Deven Agravat
%T Comparative Study of Web Spam Detection using Data Mining
%J International Journal of Computer Applications
%@ 0975-8887
%V 68
%N 18
%P 26-29
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Today World Wide Web has become one of best sources of information which is result of faster working of search engines. Web spam attempts to sway search engine algorithm in order to boost the page ranking of specific web pages in search engine results than they deserve. One way to detect web spam is using classification that is learning a classification model for classifying web pages to spam or non-spam. Comparative and empirical analysis of web spam detection using data mining techniques like LAD Tree, JRIP, J48 and Random Forest have been presented in this paper. Experiments were carried out on 3 feature sets of standard dataset WEB SPAM UK-2007. Overall results say that Random forest works well with content based features and transformed link based features however LAD tree was found best among 4 in link based features. But, while thinking about time efficiency LAD Tree was found much more time consuming as compare other 3 classification techniques.

References
  1. A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly, "Detecting spam web pages through content analysis," WWW'06, 2006, pp. 83–92.
  2. Amudha. J, Soman. K. P,c"Feature Selection in Top-Down Visual Attention Model using WEKA", International Journal of Computer Applications, Volume 24– No. 4, June 2011.
  3. Apichat Taweesiriwate, Bindit Manaskasemask,"Web Spam Detection using Link based Ant Colony Optimization", 26th IEEE International Conference on Advanced Information Networking and Applications, 2012.
  4. Gyongyi Z,Garcia-Molina H. , "Web spam taxonomy" 1st International Workshop on adversarial information retrieval on the web (AIRWeb'05), Japan, 2005.
  5. J. Ross Quinlan, Book Review: C4. 55: "Programs for Machine Learning", Morgan Kaufmann Publishers, 1993.
  6. Jaber Karimpour, Ali A Noroozi,"The Impact of Feature Selection on Web Spam Detection", I. J. Intelligent Systems and Applications, 2012,pp. 61-67.
  7. Jun-Lin Lin, "Detection of cloaked web spam by using tag based methods", Expert Systems with Applications, 2009.
  8. Leo Breiman, "RANDOM FORESTS", 2001.
  9. Liu B. "Web Data Mining, Exploring Hyperlinks, Contents, and Usage Data". Springer, 2006.
  10. Maryam Mahmoudi, Alireza Yari, "Web spam Detection based on Discriminative Content and Link Features ", 5th International Symposium on telecommunication, 2010.
  11. Miklos Erdely, Andras Garzo, "Web Spam Classification: Few Features worth More", LAWA (Large-Scale Longitudinal Web Analytics) and by the grant OTKA NK 72845, 2011.
  12. Ntoulas A, Najork M, Manasse M, "Detecting Spam Web Pages through Content Analysis", 15th International World Wide Web Conference (WWW'06), 2006, pp. 83–92.
  13. S. Nakamura, S. Konishi, A. Jatowt, H. Ohshima, H. Kondo, T. Tezuka, S. Oyama, and K. Tanaka, "Trustworthinessanalysis of web search results," in Research and AdvancedTechnology for Digital Libraries, ser. LNCS 4675, 2007, pp. 38–49.
  14. Willam Cohen, "Fast effective rule induction", Machine Learning procedings of 12th international conference,1995.
  15. Yutak I. Leon-Suemastsu, kentaro Inui, "Web spam Detection by exploring Densely connected Subgraphs", IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 2011
Index Terms

Computer Science
Information Sciences

Keywords

Spam detection Link spam Content spam Web spam Web mining JRIP LAD tree decision tree random forest