CFP last date
22 April 2024
Reseach Article

Improved Spam Detection using DBSCAN and Advanced Digest Algorithm

by Alaa H. Ahmed, Mohammad Mikki
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 69 - Number 25
Year of Publication: 2013
Authors: Alaa H. Ahmed, Mohammad Mikki
10.5120/12126-8300

Alaa H. Ahmed, Mohammad Mikki . Improved Spam Detection using DBSCAN and Advanced Digest Algorithm. International Journal of Computer Applications. 69, 25 ( May 2013), 11-16. DOI=10.5120/12126-8300

@article{ 10.5120/12126-8300,
author = { Alaa H. Ahmed, Mohammad Mikki },
title = { Improved Spam Detection using DBSCAN and Advanced Digest Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { May 2013 },
volume = { 69 },
number = { 25 },
month = { May },
year = { 2013 },
issn = { 0975-8887 },
pages = { 11-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume69/number25/12126-8300/ },
doi = { 10.5120/12126-8300 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:31:16.355745+05:30
%A Alaa H. Ahmed
%A Mohammad Mikki
%T Improved Spam Detection using DBSCAN and Advanced Digest Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 69
%N 25
%P 11-16
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. Nowadays, detecting and filtering are still the most feasible ways of fighting spam emails. There are many reasonably successful spam email filters in operation. The identification of spam plays an important role in current anti-spam mechanism. For improving the accuracy of spam detection, an improved Filtering technique is presented which is based on the Improved Digest algorithm and DBSCAN clustering algorithm. Using this technique, mails are represented using improved digest algorithm and then clustered using DBSCAN clustering algorithm. All similar emails which always categorized as spam are identified and clustered together where good mails that don't look similar like other mails are not clustered. This method greatly improves the filtering accuracy against latest proposed algorithms by 30 % and improves the resistance of spam detection against increased obfuscation effort by spammers, while keeping miss-detection of good emails at a similar level of older filtering methods.

References
  1. C. Pu and S. Webb, 2006. Observed trends in spam construction techniques: A case study of spam evolution. In Proc. of the 3rd Conf. on EMail and Anti-Spam.
  2. L. F. Cranor and B. A. LaMacchia, 1998. Spam! Communications of the ACM.
  3. Wikipedia , [online], http://en. wikipedia. org/wiki/SpamAssassin
  4. Rhyolite distributed checksum clearinghouse. http://www. rhyolite. com/dcc/
  5. Jesse Kornblum, 2006, "Identifying almost identical files using context triggered piecewise hashing", Digital Investigation, vol. 3(sl):9 1-97.
  6. Zhang Jianzhong, Lu Hongbo, Lan Xiaofeng, Dong Dafan, 2008, "DHTnil: An approach to publish and lookup Nilsimsa digests in DHT". Proc. of the 2008 International Conference on High Performance Computing and Communications (HPCC-08), Dalian, China.
  7. Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu,1996," density-based spatial clustering of applications with noise - DBSCAN".
  8. E. Damiani, S. De Capitani di Vimercati, S. Paraboschi, P. Samarati, 2004,"An Open Digest-based Technique for Spam Detection " , International Workshop on Security in Parallel and Distributed Systems
  9. Slavisa Sarafijanovic, Sabrina Perez, JeanYves Le Boudec, 2008, " Improving Digest Based Collaborative Spam Detection," MIT Spam Conference.
  10. Wu Ying, Yang Kai, Zhang Jianzhong, 2010, " Using DBSCAN Clustering Algorithm in Spam Identifying ", 2nd International Conforence on Education Technology and Computer (ICETC).
  11. J. Han and M. Kamber, 2001, " Data Mining: Concepts and Techniuqes". Morgan Kaufmann Publishers, SanFrancisco, CA,
  12. SpamAssassin-Public-Corpus. http://spamassassin. org/publiccorpus/, March 2013.
Index Terms

Computer Science
Information Sciences

Keywords

DBSCAN Digest Based Nilsimsa spam clustering