CFP last date
20 May 2024
Reseach Article

Analysis of Random Forest and Naive Bayes for Spam Mail using Feature Selection Catagorization

by Rachana Mishra, R. S. Thakur
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 80 - Number 3
Year of Publication: 2013
Authors: Rachana Mishra, R. S. Thakur
10.5120/13844-1670

Rachana Mishra, R. S. Thakur . Analysis of Random Forest and Naive Bayes for Spam Mail using Feature Selection Catagorization. International Journal of Computer Applications. 80, 3 ( October 2013), 42-47. DOI=10.5120/13844-1670

@article{ 10.5120/13844-1670,
author = { Rachana Mishra, R. S. Thakur },
title = { Analysis of Random Forest and Naive Bayes for Spam Mail using Feature Selection Catagorization },
journal = { International Journal of Computer Applications },
issue_date = { October 2013 },
volume = { 80 },
number = { 3 },
month = { October },
year = { 2013 },
issn = { 0975-8887 },
pages = { 42-47 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume80/number3/13844-1670/ },
doi = { 10.5120/13844-1670 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:53:36.340982+05:30
%A Rachana Mishra
%A R. S. Thakur
%T Analysis of Random Forest and Naive Bayes for Spam Mail using Feature Selection Catagorization
%J International Journal of Computer Applications
%@ 0975-8887
%V 80
%N 3
%P 42-47
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Today, internet users are increases Spam mail is the major problem and big challenges for researcher to reduce it . Spam is commonly defined as unsolicited email messages and the goal of spam categorization is to distinguish between spam and legitimate email messages. This paper shows classification of spam mail and solving various problems is related to web space. Many machine learning algorithm are used to classified the spam and legitimate mail. This paper identify the best classification approach using bench mark dataset . The dataset consist of 9324 records and 500 attributes used for (training and testing) to build the model. This paper can play significant role to help eliminate unsolicited commercial e-mail, viruses, Trojans, and worms, as well as frauds perpetrated electronically and other undesired and troublesome e-mail. Three machines learning supervised algorithms namely naive bayes, Random Tree and Random Forest have applied on spam mail dataset using two feature selection algorithms.

References
  1. http://www. dpw. co. santacruz. ca. us/www. santacruzcountyrecycles/Junk_Mail/index. html.
  2. Improvising BayesNet Classifier Using Various Feature Reduction Method for Spam Classification, 1D. Shanmuga Priyaa, 2B. Kavitha, 3R. Naveen Kumar, 4K. Banuroopa 1Dept. of Information Technology, Karpagam University, India,, IJCST Vol. 1, Issue 2, December 2010.
  3. A Novel Approach towards Image Spam Classification, M. Soranamageswari, Dr. C. MeenaInternational Journal of Computer Theory and Engineering, Vol. 3, No. 1, February, 2011 ,1793-8201.
  4. Fulu Li, Mo-han Hsieh, "An empirical study ofclustering behavior of spammers and Group based Anti-spam strategies", CEAS 2006, pp 21-28, 2006.
  5. Dhinaharan Nagamalai, Cynthia. D, Jae Kwang Lee," ANovel Mechanism to defend DDoS attacks caused by spam", International Journal of Smart Home, SERSC, Seoul, July 2007, pp 83-96.
  6. Calton pu, Steve webb: "Observed trends in spam construction techniques: A case study of spam evolution", CEAS 2006, pp 104-112, July 27-28, 2006.
  7. Anirudh Ramachandran, David Dagon, Nick Feamste,"Can DNS-based Blacklists keep up with Bots", CEAS 2006,CA, USA, July 27-28, 2006.
  8. SpamCop , available at http://spamcop. net.
  9. Internet User Forecasts by Country http:// www. etforecasts. Com.
  10. Nigerian fraud mail Gallery http://www. potifos. com/fraud/.
  11. Fairfax Digital http://www. smh. com. au/articles /2004/10/18.
  12. D. Shanmuga priyaa ,b. Kavitha "Improvising Bayes Net classifier using various feature reduction method for spam classification" ,ISSN :0976-8491
  13. Anil. K Jain, Robert P. W, Jianchang Mao "Statistical Pattern Reorganization: A Review", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 1, JANUARY 2000.
  14. Biao Qin, Yuni Xia Sunil Prabhakar, Yicheng Tu "A Rule-Based Classification Algorithm for Uncertain Data" IEEE International Conference on Data Engineering 2009
  15. Ziqiang Wang, Xia Sun "An Efficient Spam Filtering Algorithm Based on NPE" IEEE International Symposium on Knowledge Acquisition and Modeling Workshop,21-22Dec 2008 pp 1102 – 1104.
  16. http://www. aueb. gr/users/ion/data/PU123ACorpora. tagz.
  17. http://www. aueb. gr/users/ion/data/lingspam_public. tar. gz
  18. Ravi Kiran and Indriyati Atmosukarto , "Spam or Not Spam. That is the question".
  19. David Mertz "Spam Filtering Techniques: Comparing a Half-Dozen Approaches to Eliminating Unwanted Email" August2002Availableat:http://gnosis. cx/publish/programming/filtering-spam. html
  20. David Mertz "Spam Filtering Techniques: Comparing a Half-Dozen Approaches to Eliminating Unwanted Email" August 2002
  21. Available at: http://gnosis. cx/publish/ programming/filtering-spam. html
  22. Vangelis Metsis, Ion Androutsopoulos, Georgios Paliouras,"Spam Filtering with Naive Bayes – Which Naive Bayes?" CEAS 2006 Third Conference on Email and AntiSpam July 27-28, 2006, Mountain View, California USA.
  23. Tommi S. Jaakkola "Machine learning: lecture 7" MIT CSAIL Available at:http://www. ai. mit. edu/courses/6. 867-f04/lectures/lecture-7-ho. pdf.
  24. http://gogoshen. org/ml2005/Journal%20Paper/JournalPaper_Livingston. pdf.
  25. http://en. wikipedia. org/wiki/Naive_bayes.
  26. Benchmarking Attribute Selection Techniques for Discrete Class Data Mining by Mark A Hall and Geoffrey Holmes at: http://www2. computer. org/portal/web/csdl/doi/10. 1109/TKDE/2003. html
Index Terms

Computer Science
Information Sciences

Keywords

spam problem spam classification weka