CFP last date
20 May 2024
Reseach Article

Filtering Template driven spam mails using Vector Space models

by Liny Varghese, Supriya M.H, K. Poulose Jacob
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 39 - Number 14
Year of Publication: 2012
Authors: Liny Varghese, Supriya M.H, K. Poulose Jacob
10.5120/4891-7383

Liny Varghese, Supriya M.H, K. Poulose Jacob . Filtering Template driven spam mails using Vector Space models. International Journal of Computer Applications. 39, 14 ( February 2012), 33-35. DOI=10.5120/4891-7383

@article{ 10.5120/4891-7383,
author = { Liny Varghese, Supriya M.H, K. Poulose Jacob },
title = { Filtering Template driven spam mails using Vector Space models },
journal = { International Journal of Computer Applications },
issue_date = { February 2012 },
volume = { 39 },
number = { 14 },
month = { February },
year = { 2012 },
issn = { 0975-8887 },
pages = { 33-35 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume39/number14/4891-7383/ },
doi = { 10.5120/4891-7383 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:26:29.544090+05:30
%A Liny Varghese
%A Supriya M.H
%A K. Poulose Jacob
%T Filtering Template driven spam mails using Vector Space models
%J International Journal of Computer Applications
%@ 0975-8887
%V 39
%N 14
%P 33-35
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Spam became a big problem to the society. Some spammers are using templates for sending spam. To send a particular promotion they create some template and merge the details of receivers with the template. Similarities can find among these mails and easily ignore the forthcoming spam. Most high-volume spam is sent using tools those randomizes parts of the message - subject, body, sender address etc. The general form of the template that the spammer is using can often guess by inspecting the features of messages. Most of the spam filters are either rule based models or Bayesian models. The main objective in this paper is to find out semantic distance and evaluate the applicability of the two information retrieval techniques, Simple Vector Space Models (VSM) and VSM using Rocchio Classification in the spam context. Both methods are using cosine similarities to identify the spam

References
  1. G. Salton, A. Wong, and C. S. Yang, "A Vector Space Model for Automatic Indexing," Communications of the ACM, vol. 18, nr. 11, pages 613–620(1975).
  2. J. J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice-Hall Series in Automatic Computation, chapter 14, pages 313–323. Prentice-Hall, Englewood Cliffs NJ, 1971.
  3. Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: An Introduction to Information Retrieval, page 181. Cambridge University Press, 2009.
  4. Wilfried N. Gansterer_ Andreas G. K. Janecek Robert Neumayer, Spam Filtering Based on Latent Semantic Indexing
  5. http://en.wikipedia.org/wiki/Vector_space_model viewed on January 2012
  6. Tuomo Korenius, Jorma Laurikkala, Martti Juhola, On principal component analysis, cosine and Euclidean measures in information retrieval, Information Sciences, Volume 177, Issue 22, 15 November 2007, Pages 4893-4905, ISSN 0020-0255
  7. Congnan Luo, Yanjun Li, Soon M. Chung, Text document clustering based on neighbors, Data & Knowledge Engineering, Volume 68, Issue 11, November 2009, Pages 1271-1288, ISSN 0169-023X, 10.1016/j.datak.2009.06.007.
  8. Angel R. Martinez, Data Mining of Text Files, In: C.R. Rao, E.J. Wegman and J.L. Solka, Editor(s), Handbook of Statistics, Elsevier, 2005, Volume 24, Pages 109-131, ISSN 0169-7161, ISBN 9780444511416, 10.1016/S0169-7161(04)24004-4.
  9. Thamarai Subramaniam, Hamid A. Jalab and Alaa Y. Taqa , Overview of textual anti-spam filtering techniques , International Journal of the Physical Sciences Vol. 5(12), pp. 1869-1882, 4 October, 2010
Index Terms

Computer Science
Information Sciences

Keywords

Spam vector space models Rocchio classification cosine similarity