Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining

Kavita Patel

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining

by Kavita Patel

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 90 - Number 8

Year of Publication: 2014

Authors: Kavita Patel

10.5120/15595-4341

Kavita Patel . Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining. International Journal of Computer Applications. 90, 8 ( March 2014), 25-30. DOI=10.5120/15595-4341

@article{ 10.5120/15595-4341,

author = { Kavita Patel },

title = { Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining },

journal = { International Journal of Computer Applications },

issue_date = { March 2014 },

volume = { 90 },

number = { 8 },

month = { March },

year = { 2014 },

issn = { 0975-8887 },

pages = { 25-30 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume90/number8/15595-4341/ },

doi = { 10.5120/15595-4341 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:10:31.764965+05:30

%A Kavita Patel

%T Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining

%J International Journal of Computer Applications

%@ 0975-8887

%V 90

%N 8

%P 25-30

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper attempts to develop an algorithm to recognize spam domains using data mining techniques with the focus on law enforcement forensic analysis. Spam filtering has been the major weapon against spam, but failed to reduce the number of spam emails sent to an indiscriminate set of recipients. The proposed algorithm accepts as input, spam mails of personal account and extracts features such as stylistic, semantic, related email subjects and URLs present in the emails. The individual features are then clustered and evaluated. Further, these clusters are mapped with their respective domains. These spam domains are the URL of the webpage that spammer is trying to promote. The WHOIS information of the domain helps to get information about the source of that domain. Parameters like overall purity and the number of emails present in the cluster with highest purity is used to measure result of the individual features. An Experimental result shows that clustering of spam mails by stylistic and semantic parameter 20% less pure than other two features of spam mails.

References

Soma Halder, Richa Tiwari, Alan Sprague. 2011. "Information Extraction from Spam Emails using Stylistic and Semantic Features to Identify Spammers". IEEE.
C. Wei, A. P. Sprague, G. Warner, and A. Skjellum. "Clustering spam domains and targeting spam origin for forensic analysis", J. Digital Forensics, Security, and Law (Vol: 5),ADFSL, USA,2010.
Kaspersky, Internet security Center, threats report statistics. http://usa. kaspersky. com/internet-security-center/threats/spamstatistics-report-q2-2013#. Uq6poM5P_rQ
All Spammed up, Anti-spam in a business environment. http://www. allspammedup. com/2012/05/the-cost-of-spam-is-rising/
F. Li, M. Hseieh, "An Empirical Study of Clustering Behavior of Spammers and Group Based Anti-Spam Strategies", In Proc. of the 3rd Conf. on Email and Anti-Spam, USA, 2006.
Anirudh Ramachandran and Nick Feamster "Understanding the Network Level Behavior of Spammers", 2006, Georgia Tech.
Marios Kokkodis and Ting-Kai Huang, "An empirical study of spam and spammers behaviour". 2006, University of California, Riverside.
G. Warner A. P. Sprague and C. Wei "Clustering malware-generated spam emails with a novel fuzzy string matching algorithm", In Proc. of SAC '09. Honolulu, Hawaii, U. S. A.
SpamAssassin, 2005. http://www. spamassassin. org/.
C. Wei, A. P. Sprague, G. Warner and Anthony Skjellum "Mining Spam Email to Identify Common Origins for Forensic Application", SAC'08, March 16-20, 2008, Fortaleza, Ceará, Brazil. Copyright 2008 ACM 978-1-59593-753-7/08/0003
C. Wei, A. P. Sprague, G. Warner and Anthony Skjellum "Identifying New Spam Domains by Hosting IPs: Improving Domain Blacklisting", Copyright 2006 ACM 238-7-59463-783-7/08/0007
Spamhaus DBL. http://www. spamhaus. org/dbl/
Dietrich, C. and Rossow, C. "Empirical research on IP blacklisting", ISSE 2008 Securing Electronic Business Processes, 163, 2009.
SURBL. http://www. surbl. org
URIBL. http://www. uribl. com
Dietrich, C. and Rossow, C. "Spam, Domain Names and Registrars", MAAWG 12th General Meeting, San Francisco February 18th-20th, 2008.
Project Honey Pot. http://www. projecthoneypot. org/.
Wikipedia http://en. wikipedia. org/wiki/Cluster_analysis
Calton Pu and Steve Webb. "Observed Trends in Spam Construction Techniques: A Case Study of Spam Evolution". CEAS 2006 Third Conference on Email and AntiSpam, July 2728, 2006, Mountain View, California USA.
P. Tan, M. Steinbach, V. Kumar, Introduction to DataMining, (First Edition), Addison-Wesley Longman Publishing Co. , USA, 2005, pp 496-515.
Chun Wei, Clustering Spam Domains and Hosts: Anti-Spam Forensics with Data Mining, Dissertation, 2010.
Jeet Morparia, "Peer-to-Peer Botnets: Analysis and Detection" 2008.

Index Terms

Computer Science

Information Sciences

Keywords

Spam Semantics Stylistics Data Mining Clustering