CFP last date
20 May 2024
Reseach Article

Tuned Artificial Neural Network Model for E-mail Data Classification with Feature Selection

by H. S. Hota, Akhilesh Kumar Shrivas, S. K. Singhai
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 67 - Number 25
Year of Publication: 2013
Authors: H. S. Hota, Akhilesh Kumar Shrivas, S. K. Singhai
10.5120/11744-7322

H. S. Hota, Akhilesh Kumar Shrivas, S. K. Singhai . Tuned Artificial Neural Network Model for E-mail Data Classification with Feature Selection. International Journal of Computer Applications. 67, 25 ( April 2013), 20-25. DOI=10.5120/11744-7322

@article{ 10.5120/11744-7322,
author = { H. S. Hota, Akhilesh Kumar Shrivas, S. K. Singhai },
title = { Tuned Artificial Neural Network Model for E-mail Data Classification with Feature Selection },
journal = { International Journal of Computer Applications },
issue_date = { April 2013 },
volume = { 67 },
number = { 25 },
month = { April },
year = { 2013 },
issn = { 0975-8887 },
pages = { 20-25 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume67/number25/11744-7322/ },
doi = { 10.5120/11744-7322 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:26:25.494124+05:30
%A H. S. Hota
%A Akhilesh Kumar Shrivas
%A S. K. Singhai
%T Tuned Artificial Neural Network Model for E-mail Data Classification with Feature Selection
%J International Journal of Computer Applications
%@ 0975-8887
%V 67
%N 25
%P 20-25
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With the rapid development of Internet, e-mail has become effective means of communication to share information. Through e-mail, we can send text messages, images, audio and video clips across the world within a fraction of time. In recent years, e-mail users are facing problem due to spam e-mails. Spam e-mails are unsolicited commercial/bulk e-mails sent by spammers. There are many serious problems associated with spam e-mails, e. g. it may contain hyperlink which may lead to a bogus website which might ask you for your personal information like username, password, bank account number etc. . Spam e-mail is not only wastage of storage space but also wastage of time. In order to tackle problems faced by users due to spam e-mail, it is necessary to classify them with the help of intelligent and robust classifier. These classifiers should have the capability to classify spam e-mail against non-spam e-mail. The spam e-mail classifier performance can be greatly enhanced with the use of artificial neural network classification algorithm. An Artificial Neural Network (ANN) is a powerful tool used for classification of data , it has capability of learning huge amount of data with high dimensionality in better way, there are various parameters of ANN to be set to tune for the better performance of neural network model, these are learning rate, architecture of ANN and momentum, these all parameters play a very important role in improving the accuracy of ANN model. In this paper Error Back Propagation Network (EBPN) techniques based on ANN are explored with different value of learning rate from 0. 2 to 0. 9. An EBPN model is derived from e-mail data set obtained from UCI repository site with three different partitions. Due to high dimensionality of data set, we have applied feature selection technique for the best model. This model is tested with various combinations of feature and it is concluded that model is producing highest accuracy of 98. 49% on testing samples with 52 features. The derived model is also measured with precision, recall and F-measure and achieved 98. 34%, 99. 07% and 98. 70% respectively.

References
  1. El-Sayed M. El-Alfy et al. , "Using GMDH-based networks for improved spam detection and email feature analysis", Applied soft computing, vol. 11, pp. 477-488, 2011.
  2. Ismaila Idris, "E-mail spam classification with ANN and Negative selection algorithms", International Journal of Computer Science & Communication Networks, Vol. 1(3), pp 227-231, 2011.
  3. Jiawei Han and Micheline Kamber, "Data Mining Concepts and Techniques ", Morgan Kaufmann, San Francisco, Second Edition, 2006.
  4. Omar Saad et al. , "A Survey of machine learning Techniques for spam filtering", IJCSNS International Journal of Computer Science and Network Security, vol. 12 No. 2,2012.
  5. W. A. Awad, "Machine Learning methods for email classification", International Journal of Computer Applications vol. 16– No. 1, pp. 0975 – 8887, 2011.
  6. Hota H. S. et al. ,"Data mining techniques and its ensemble model applied for classification of e-mail data", proceeding of review of business and technology research in International conference EPPICTM ,vol. 5 ,No. 1, ,pp. 473-479,2012.
  7. Hota H. S. et al. ,"E-mail and its security: A modern way of teaching and research", proceeding of International conference on Innovation and Research in technology for Sustainable Development (ICIRT) pp. 168-170, ISBN 978-93-82338-21-5,2012.
  8. Lei SHI, et al. ,"Spam E-mail classification using decision tree ensemble", Journal of Computational Information Systems, vol 8,N0. 3 pp. 949-956,2012.
  9. K. , J. , Cios et al. , "Data mining methods for knowledge discovery", 3rd printing, kluwer academic publishers, (USA),2000.
  10. UCI Machine Learning Repository of machine learning databases (2010). University of California, school of Information and Computer Science, Irvine. C. A. http://archive. ics. uci. edu/ml/datasets/Spambase, August 2012.
  11. SPSS Clementine help file http//www. spss. com last accessed on Oct 2012.
Index Terms

Computer Science
Information Sciences

Keywords

Spam e-mail Classification Error Back Propagation Network (EBPN) Feature Selection