CFP last date
20 May 2024
Reseach Article

A Comparative Study Between Naive Bayes and Neural Network (MLP) Classifier for Spam Email Detection

Published on April 2014 by Amit Kumar Sharma, Sudesh Kumar Prajapat, Mohammed Aslam
National Seminar on Recent Advances in Wireless Networks and Communications
Foundation of Computer Science USA
NWNC - Number 2
April 2014
Authors: Amit Kumar Sharma, Sudesh Kumar Prajapat, Mohammed Aslam
3e20f3ae-f03b-4e1b-bbd4-5b7b1a38ed75

Amit Kumar Sharma, Sudesh Kumar Prajapat, Mohammed Aslam . A Comparative Study Between Naive Bayes and Neural Network (MLP) Classifier for Spam Email Detection. National Seminar on Recent Advances in Wireless Networks and Communications. NWNC, 2 (April 2014), 12-16.

@article{
author = { Amit Kumar Sharma, Sudesh Kumar Prajapat, Mohammed Aslam },
title = { A Comparative Study Between Naive Bayes and Neural Network (MLP) Classifier for Spam Email Detection },
journal = { National Seminar on Recent Advances in Wireless Networks and Communications },
issue_date = { April 2014 },
volume = { NWNC },
number = { 2 },
month = { April },
year = { 2014 },
issn = 0975-8887,
pages = { 12-16 },
numpages = 5,
url = { /proceedings/nwnc/number2/16116-1416/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Seminar on Recent Advances in Wireless Networks and Communications
%A Amit Kumar Sharma
%A Sudesh Kumar Prajapat
%A Mohammed Aslam
%T A Comparative Study Between Naive Bayes and Neural Network (MLP) Classifier for Spam Email Detection
%J National Seminar on Recent Advances in Wireless Networks and Communications
%@ 0975-8887
%V NWNC
%N 2
%P 12-16
%D 2014
%I International Journal of Computer Applications
Abstract

The continue demands of internet and email communication has creating spam emails also known unsolicited bulk mails. These emails enter bypass in our mail box and affect our system. Different filtering techniques are using to detect these emails such as Random Forest, Naive Bayesian, SVM and Neural Network. In this paper, we compare the different performance matrices using Bayesian Classification and Neural Network approaches of data mining that are completely based on content of emails. Proposed method are based on data mining approach, that provides an anti spam filtering technique that segregate spam and ham emails from large dataset. Methodologies that are used for the filtering methods are machine learning techniques using ANN and Bayesian Network based solutions. This approach practically applied on Trec07 dataset.

References
  1. Rasim M Alguliev, Ramiz M Aliguliyev, and Saadat A Nazirova. Classification of textual e-mail spam using data mining techniques. Applied Computational Intelligence and Soft Computing, 2011:10, 2011.
  2. T. A. Almeida and A. Yamakami. Content-based spam filtering. In Neural Networks (IJCNN), The 2010 International Joint Conference on, pages 1-7, 2010.
  3. Veena H Bhat, Vandana R Malkani, PD Shenoy, KR Venugopal, and LM Patnaik. Classification of email using beaks: Behaviour and keyword stemming. In TENCON 2011-2011 IEEE Region 10 Conference, pages 1139-1143. IEEE, 2011.
  4. Godwin Caruana and Maozhen Li. A survey of emerging approaches to spam filtering. ACM Computing Surveys (CSUR), 44(2):9, 2012.
  5. Duke education. Stemming code. URL http://www. cs. duke. edu/courses/compsci308/cps108/fall07/code/stemmer/ code. pdf.
  6. George Giannakopoulos, Petra Mavridi, Georgios Paliouras, George Papadakis, and Konstantinos Tserpes. Representation models for text classification: a comparative analysis over three web document types. In Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics, page 13. ACM, 2012.
  7. YiShan Gong and Qiang Chen. Research of spam filtering based on bayesian algorithm. In Computer Application and System Modeling (ICCASM), 2010 International Conference on, volume 4, pages V4-678-V4-680, 2010.
  8. Jiawei Han, Micheline Kamber, and Jian Pei. Data mining: concepts and techniques. Morgan kaufmann, 2006.
  9. Biju Issac and Wendy J Jap. Implementing spam detection using bayesian and porter stemmer keyword stripping approaches. In TENCON 2009-2009 IEEE Region 10 Conference, pages 1-5. IEEE, 2009.
  10. R Kishore Kumar, G Poonkuzhali, and P Sudhakar. Comparative study on email spam classifier using data mining techniques. In Proceedings of the International MultiConference of Engineers and Computer Scientists, volume 1, 2012.
  11. M. F. Porter. Porter stemming algorithm. URL http://tartarus. org/martin/PorterStemmer/def. txt.
  12. NIST (National Institute of Standard and Technology) US govt. Trec07 dataset. URL http://trec. nist. gov/data/spam. html.
  13. R Parimala and R Nallaswamy. A study of spam e-mail classification using feature selection package. Global Journal of Computer Science and Technology, 11(7), 2011.
  14. Noemi Perez-Diaz, David Ruano-Ordas, Florentino Fdez-Riverola, and Jose R Mendez. Sdai: An integral evaluation methodology for content-based spam filtering models. Expert Systems with Applications, 2012.
  15. Aziz Qaroush, Ismail M Khater, and Mahdi Washaha. Identifying spam email based-on statistical header features and sender behavior. In Proceedings of the CUBE International Information Technology Conference, pages 771-778. ACM, 2012.
  16. Alessandro Rozza, Gabriele Lombardi, and Elena Casiraghi. Novel ipca based classifiers and their application to spam filtering. In Intelligent Systems Design and Applications, 2009. ISDA'09. Ninth International Conference on, pages 797-802. IEEE, 2009.
  17. Zac Sadan and David G Schwartz. Social network analysis of web links to eliminate false positives in collaborative anti-spam systems. Journal of Network and Computer Applications, 34(5):1717-1723, 2011.
  18. Onix text retrieval toolkit. Stop word lists. URL http://www. lextek. com/manuals/onix/stopwords. html.
  19. Jiansheng Wu and Tao Deng. Research in anti-spam method based on bayesian filtering. In Computational Intelligence and Industrial Application, 2008. PACIIA '08. Pacific-Asia Workshop on, volume 2, pages 887-891, 2008.
  20. Seongwook Youn and Dennis McLeod. A comparative study for email classification. In Advances and Innovations in Systems, Computing Sciences and Software Engineering, pages 387-391. Springer, 2007.
  21. Backscatter. Source of Spam. http://www. spamresource. com/2007/02/backscatter-whatis-it-how-do-i-stop-it. html
  22. Owen Kufundirimbwa and Richard Gotora. Spam detection using artificial neural network. JPESR, ISSN:2315-5027, Vol 1, Issue 1, PP 22-29, June 2012.
  23. Laurence Fausett. Fundamental of Neural Network. Architecture, Algorithms and Application, Page 24-26, 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Spam Filtering Feature Selection Stemming Features Reduction Naive Bayes Neural Network Mlp.