CFP last date
20 May 2024
Reseach Article

Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining

by Kavita Patel
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 90 - Number 8
Year of Publication: 2014
Authors: Kavita Patel
10.5120/15595-4341

Kavita Patel . Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining. International Journal of Computer Applications. 90, 8 ( March 2014), 25-30. DOI=10.5120/15595-4341

@article{ 10.5120/15595-4341,
author = { Kavita Patel },
title = { Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining },
journal = { International Journal of Computer Applications },
issue_date = { March 2014 },
volume = { 90 },
number = { 8 },
month = { March },
year = { 2014 },
issn = { 0975-8887 },
pages = { 25-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume90/number8/15595-4341/ },
doi = { 10.5120/15595-4341 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:10:31.764965+05:30
%A Kavita Patel
%T Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining
%J International Journal of Computer Applications
%@ 0975-8887
%V 90
%N 8
%P 25-30
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper attempts to develop an algorithm to recognize spam domains using data mining techniques with the focus on law enforcement forensic analysis. Spam filtering has been the major weapon against spam, but failed to reduce the number of spam emails sent to an indiscriminate set of recipients. The proposed algorithm accepts as input, spam mails of personal account and extracts features such as stylistic, semantic, related email subjects and URLs present in the emails. The individual features are then clustered and evaluated. Further, these clusters are mapped with their respective domains. These spam domains are the URL of the webpage that spammer is trying to promote. The WHOIS information of the domain helps to get information about the source of that domain. Parameters like overall purity and the number of emails present in the cluster with highest purity is used to measure result of the individual features. An Experimental result shows that clustering of spam mails by stylistic and semantic parameter 20% less pure than other two features of spam mails.

References
  1. Soma Halder, Richa Tiwari, Alan Sprague. 2011. "Information Extraction from Spam Emails using Stylistic and Semantic Features to Identify Spammers". IEEE.
  2. C. Wei, A. P. Sprague, G. Warner, and A. Skjellum. "Clustering spam domains and targeting spam origin for forensic analysis", J. Digital Forensics, Security, and Law (Vol: 5),ADFSL, USA,2010.
  3. Kaspersky, Internet security Center, threats report statistics. http://usa. kaspersky. com/internet-security-center/threats/spamstatistics-report-q2-2013#. Uq6poM5P_rQ
  4. All Spammed up, Anti-spam in a business environment. http://www. allspammedup. com/2012/05/the-cost-of-spam-is-rising/
  5. F. Li, M. Hseieh, "An Empirical Study of Clustering Behavior of Spammers and Group Based Anti-Spam Strategies", In Proc. of the 3rd Conf. on Email and Anti-Spam, USA, 2006.
  6. Anirudh Ramachandran and Nick Feamster "Understanding the Network Level Behavior of Spammers", 2006, Georgia Tech.
  7. Marios Kokkodis and Ting-Kai Huang, "An empirical study of spam and spammers behaviour". 2006, University of California, Riverside.
  8. G. Warner A. P. Sprague and C. Wei "Clustering malware-generated spam emails with a novel fuzzy string matching algorithm", In Proc. of SAC '09. Honolulu, Hawaii, U. S. A.
  9. SpamAssassin, 2005. http://www. spamassassin. org/.
  10. C. Wei, A. P. Sprague, G. Warner and Anthony Skjellum "Mining Spam Email to Identify Common Origins for Forensic Application", SAC'08, March 16-20, 2008, Fortaleza, Ceará, Brazil. Copyright 2008 ACM 978-1-59593-753-7/08/0003
  11. C. Wei, A. P. Sprague, G. Warner and Anthony Skjellum "Identifying New Spam Domains by Hosting IPs: Improving Domain Blacklisting", Copyright 2006 ACM 238-7-59463-783-7/08/0007
  12. Spamhaus DBL. http://www. spamhaus. org/dbl/
  13. Dietrich, C. and Rossow, C. "Empirical research on IP blacklisting", ISSE 2008 Securing Electronic Business Processes, 163, 2009.
  14. SURBL. http://www. surbl. org
  15. URIBL. http://www. uribl. com
  16. Dietrich, C. and Rossow, C. "Spam, Domain Names and Registrars", MAAWG 12th General Meeting, San Francisco February 18th-20th, 2008.
  17. Project Honey Pot. http://www. projecthoneypot. org/.
  18. Wikipedia http://en. wikipedia. org/wiki/Cluster_analysis
  19. Calton Pu and Steve Webb. "Observed Trends in Spam Construction Techniques: A Case Study of Spam Evolution". CEAS 2006 Third Conference on Email and AntiSpam, July 2728, 2006, Mountain View, California USA.
  20. P. Tan, M. Steinbach, V. Kumar, Introduction to DataMining, (First Edition), Addison-Wesley Longman Publishing Co. , USA, 2005, pp 496-515.
  21. Chun Wei, Clustering Spam Domains and Hosts: Anti-Spam Forensics with Data Mining, Dissertation, 2010.
  22. Jeet Morparia, "Peer-to-Peer Botnets: Analysis and Detection" 2008.
Index Terms

Computer Science
Information Sciences

Keywords

Spam Semantics Stylistics Data Mining Clustering