Call for Paper - November 2020 Edition
IJCA solicits original research papers for the November 2020 Edition. Last date of manuscript submission is October 20, 2020. Read More

Improved Spam Detection using DBSCAN and Advanced Digest Algorithm

International Journal of Computer Applications
© 2013 by IJCA Journal
Volume 69 - Number 25
Year of Publication: 2013
Alaa H. Ahmed
Mohammad Mikki

Alaa H Ahmed and Mohammad Mikki. Article: Improved Spam Detection using DBSCAN and Advanced Digest Algorithm. International Journal of Computer Applications 69(25):11-16, May 2013. Full text available. BibTeX

	author = {Alaa H. Ahmed and Mohammad Mikki},
	title = {Article: Improved Spam Detection using DBSCAN and Advanced Digest Algorithm},
	journal = {International Journal of Computer Applications},
	year = {2013},
	volume = {69},
	number = {25},
	pages = {11-16},
	month = {May},
	note = {Full text available}


E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. Nowadays, detecting and filtering are still the most feasible ways of fighting spam emails. There are many reasonably successful spam email filters in operation. The identification of spam plays an important role in current anti-spam mechanism. For improving the accuracy of spam detection, an improved Filtering technique is presented which is based on the Improved Digest algorithm and DBSCAN clustering algorithm. Using this technique, mails are represented using improved digest algorithm and then clustered using DBSCAN clustering algorithm. All similar emails which always categorized as spam are identified and clustered together where good mails that don't look similar like other mails are not clustered. This method greatly improves the filtering accuracy against latest proposed algorithms by 30 % and improves the resistance of spam detection against increased obfuscation effort by spammers, while keeping miss-detection of good emails at a similar level of older filtering methods.


  • C. Pu and S. Webb, 2006. Observed trends in spam construction techniques: A case study of spam evolution. In Proc. of the 3rd Conf. on EMail and Anti-Spam.
  • L. F. Cranor and B. A. LaMacchia, 1998. Spam! Communications of the ACM.
  • Wikipedia , [online], http://en. wikipedia. org/wiki/SpamAssassin
  • Rhyolite distributed checksum clearinghouse. http://www. rhyolite. com/dcc/
  • Jesse Kornblum, 2006, "Identifying almost identical files using context triggered piecewise hashing", Digital Investigation, vol. 3(sl):9 1-97.
  • Zhang Jianzhong, Lu Hongbo, Lan Xiaofeng, Dong Dafan, 2008, "DHTnil: An approach to publish and lookup Nilsimsa digests in DHT". Proc. of the 2008 International Conference on High Performance Computing and Communications (HPCC-08), Dalian, China.
  • Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu,1996," density-based spatial clustering of applications with noise - DBSCAN".
  • E. Damiani, S. De Capitani di Vimercati, S. Paraboschi, P. Samarati, 2004,"An Open Digest-based Technique for Spam Detection " , International Workshop on Security in Parallel and Distributed Systems
  • Slavisa Sarafijanovic, Sabrina Perez, JeanYves Le Boudec, 2008, " Improving Digest Based Collaborative Spam Detection," MIT Spam Conference.
  • Wu Ying, Yang Kai, Zhang Jianzhong, 2010, " Using DBSCAN Clustering Algorithm in Spam Identifying ", 2nd International Conforence on Education Technology and Computer (ICETC).
  • J. Han and M. Kamber, 2001, " Data Mining: Concepts and Techniuqes". Morgan Kaufmann Publishers, SanFrancisco, CA,
  • SpamAssassin-Public-Corpus. http://spamassassin. org/publiccorpus/, March 2013.