CFP last date
22 April 2024
Reseach Article

Spam Filtering through Multiple Pattern Bit Parallel String Matching Combining Shift AND & OR

by Vidya Saikrishna, Akhtar Rasool, Nilay Khare
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 61 - Number 5
Year of Publication: 2013
Authors: Vidya Saikrishna, Akhtar Rasool, Nilay Khare
10.5120/9926-4552

Vidya Saikrishna, Akhtar Rasool, Nilay Khare . Spam Filtering through Multiple Pattern Bit Parallel String Matching Combining Shift AND & OR. International Journal of Computer Applications. 61, 5 ( January 2013), 40-45. DOI=10.5120/9926-4552

@article{ 10.5120/9926-4552,
author = { Vidya Saikrishna, Akhtar Rasool, Nilay Khare },
title = { Spam Filtering through Multiple Pattern Bit Parallel String Matching Combining Shift AND & OR },
journal = { International Journal of Computer Applications },
issue_date = { January 2013 },
volume = { 61 },
number = { 5 },
month = { January },
year = { 2013 },
issn = { 0975-8887 },
pages = { 40-45 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume61/number5/9926-4552/ },
doi = { 10.5120/9926-4552 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:08:51.866822+05:30
%A Vidya Saikrishna
%A Akhtar Rasool
%A Nilay Khare
%T Spam Filtering through Multiple Pattern Bit Parallel String Matching Combining Shift AND & OR
%J International Journal of Computer Applications
%@ 0975-8887
%V 61
%N 5
%P 40-45
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Spam refers to unsolicited, unwanted and inappropriate bulk email. Spam filtering has become conspicuous as they consume a lot of network bandwidth, overloads the email server and drops the productivity of global economy. Content based spam filtering is accomplished with the help of multiple pattern string matching algorithm. Traditionally Aho Corasick algorithm was used to filter spam which constructs a trie of the spam keywords. The performance degrades in the context of time as well as space as the size of trie increases with the growing spam keywords count. To counterbalance time and space loss, bit parallel multiple pattern string matching algorithm using Shift OR method is used. The method acts as filter performing approximate string matching . This implies that there are some false matches detected by the filter which requires verification. The proposed method for filtering spams has been developed using a combination of Shift AND and OR operation. The method directly works on spam keywords of equal size whereas for unequal size keywords, a new proposed equal size grouping method is developed. Both method shows improvement over the Aho Corasick algorithm in context of space complexity and also behaves as an efficient filter and reducing the number of false matches as present in Shift OR method.

References
  1. Leena Salmela, J. Tarhio and J. Kytojoki "MultiPattern String Matching with Very Large Pattern Sets", ACM Journal of Experimental Algorithmics, Volume 11, 2006. Rajesh Prasad, Anuj Kumar Sharma and Alok Singh.
  2. Rajesh Prasad, Anuj Kumar Sharma and Alok Singh. "Efficient bit parallel multiple Pattern approximate string matching algorithm", Academic Journals , Scientific Research and Essays, Vol(6), pp876-881, February 2011.
  3. Gonzalo Navarro and Mathieu Raffinot. A Bit Parallel approach to Suffix Automata : Fast Extended String Matching. In M. Farach (editor), Proc. CPM'98, LNCS 1448. Pages 14-33, 1998.
  4. G. Navarro,M. Raffinot, Fast and flexible string matching by combining bit-parallelism and suffix automata,ACM J. Experimental Algorithmics (JEA) 5 (4) (2000).
  5. David E. Culler, Jaswinder Pal Singh, Anoop Gupta. Parallel Computer Architecture - A Hardware/Software Approach. Morgan Kaufmann Publishers, 1999. ISBN 1-55860-343-3, pg 15
  6. "The Two Percent Solution" by Jim Turley 2002
  7. http://en. wikipedia. org/wiki/string_searching_algorthim
  8. Baeza-Yates RA, Gonnet GH (1992). A new approach to text searching. Commun. ACM. , 35(10): 74-82.
  9. Dylan Mors and Dermot Harnett (2009)
  10. Paul Graham (2002) A Plan for Spam
Index Terms

Computer Science
Information Sciences

Keywords

Spam filtering String matching Bit parallelism