CFP last date
22 April 2024
Reseach Article

Spam Mail Filtering Technique using Different Decision Tree Classifiers through Data Mining Approach - A Comparative Performance Analysis

by Sarit Chakraborty, Bikromadittya Mondal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 47 - Number 16
Year of Publication: 2012
Authors: Sarit Chakraborty, Bikromadittya Mondal
10.5120/7274-0435

Sarit Chakraborty, Bikromadittya Mondal . Spam Mail Filtering Technique using Different Decision Tree Classifiers through Data Mining Approach - A Comparative Performance Analysis. International Journal of Computer Applications. 47, 16 ( June 2012), 26-31. DOI=10.5120/7274-0435

@article{ 10.5120/7274-0435,
author = { Sarit Chakraborty, Bikromadittya Mondal },
title = { Spam Mail Filtering Technique using Different Decision Tree Classifiers through Data Mining Approach - A Comparative Performance Analysis },
journal = { International Journal of Computer Applications },
issue_date = { June 2012 },
volume = { 47 },
number = { 16 },
month = { June },
year = { 2012 },
issn = { 0975-8887 },
pages = { 26-31 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume47/number16/7274-0435/ },
doi = { 10.5120/7274-0435 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:42:02.350322+05:30
%A Sarit Chakraborty
%A Bikromadittya Mondal
%T Spam Mail Filtering Technique using Different Decision Tree Classifiers through Data Mining Approach - A Comparative Performance Analysis
%J International Journal of Computer Applications
%@ 0975-8887
%V 47
%N 16
%P 26-31
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In recent years the highestdegree of communication happens through e-mails which are often affected by passive or active attacks. Effective spam filtering measures are the timely requirement to handle such attacks. Many efficient spam filters are available now-a-days with different degrees of performance and usually the accuracy level varies between 60-80% on an average. But most of the filtering techniques are unable to handle frequent changing scenario of spam mails adopted by the spammers over the time. Therefore improved spam control algorithms or enhancing the efficiency of various existing data mining algorithms to its fullest extent are the utmost requirement. In this paper three types of decision tree classifying techniques which are basically data mining classifiers namely Naïve Bayes Tree classifier (NBT), C 4. 5 (or J48) decision tree classifier and Logistic Model Tree classifier (LMT) are studied and analyzed for spam mail filtration. The test results depict that LMT is giving the most efficient result in terms of performance with almost 90% accuracy level to detect spam mails and non-spam (HAM) mails.

References
  1. P. Sudhakar, G. Poonkuzhali, K. Thaigarajan, K. Sarukesi, International Journal of Compuers, Issue 3, Volume 5, 2011, P. 332-345
  2. Almeida T, Yamakami A, Almeida J (2009) Evaluation of approaches for dimensionality reduction applied with Naive Bayes anti-spam filters. In: Proceedings of the 8th IEEE international conference on machine learning and applications, Miami, FL, USA, pp 517–522
  3. Cormack G (2008) Email spam filtering: a systematic review. Found Trends InfRetr 1(4):335–455
  4. Machine Learning Techniques in Spam FilteringKonstantin Tretyakov, kt@ut. ee, Institute of Computer Science, University of Tartu, Data Mining Problem-oriented Seminar, MTAT. 03. 177, May 2004, pp. 60-79.
  5. A Study on Email Spam Filtering Techniques, Christina V et. all. International Journal of Computer Applications (0975 – 8887) Volume 12– No. 1, December 2010, pp. 07-09
  6. Adaptive Spai Mail Filtering Using Genetic Algorithm,SanpakdeU et. all. Advanced Communication Technology, 2006. ICACT 2006. The 8th International Conference, 20-22 Feb. 2006, Vol 1, 441 - 445
  7. J. Quinlan. C 4. 5: Programs for Machine Learning. Morgan Kaufmann, 1992.
  8. V. Christina et al. Email Spam Filtering using Supervised Machine Learning Techniques. International Journal on Computer Science and Engineering (IJCSE) Vol. 02, No. 09, 2010, 3126-3129
  9. Ahmed Khorsi, "An Overview of Content-based Spam Filtering Techniques", Informatica, vol. 31, no. 3, October 2007, pp 269-277.
  10. Weka. WEKA (Data Mining Software). Available athttp://www. cs. waikato. ac. nz/ml/weka/. 2006
Index Terms

Computer Science
Information Sciences

Keywords

Spam Ham data Mining naïve Bayes Tree Logistic Model Tree Machinelearning Spam-score