Call for Paper - March 2023 Edition
IJCA solicits original research papers for the March 2023 Edition. Last date of manuscript submission is February 20, 2023. Read More

Spam Filtering through Multiple Pattern Bit Parallel String Matching Combining Shift AND & OR

Print
PDF
International Journal of Computer Applications
© 2013 by IJCA Journal
Volume 61 - Number 5
Year of Publication: 2013
Authors:
Vidya Saikrishna
Akhtar Rasool
Nilay Khare
10.5120/9926-4552

Vidya Saikrishna, Akhtar Rasool and Nilay Khare. Article: Spam Filtering through Multiple Pattern Bit Parallel String Matching Combining Shift AND & OR. International Journal of Computer Applications 61(5):40-45, January 2013. Full text available. BibTeX

@article{key:article,
	author = {Vidya Saikrishna and Akhtar Rasool and Nilay Khare},
	title = {Article: Spam Filtering through Multiple Pattern Bit Parallel String Matching Combining Shift AND & OR},
	journal = {International Journal of Computer Applications},
	year = {2013},
	volume = {61},
	number = {5},
	pages = {40-45},
	month = {January},
	note = {Full text available}
}

Abstract

Spam refers to unsolicited, unwanted and inappropriate bulk email. Spam filtering has become conspicuous as they consume a lot of network bandwidth, overloads the email server and drops the productivity of global economy. Content based spam filtering is accomplished with the help of multiple pattern string matching algorithm. Traditionally Aho Corasick algorithm was used to filter spam which constructs a trie of the spam keywords. The performance degrades in the context of time as well as space as the size of trie increases with the growing spam keywords count. To counterbalance time and space loss, bit parallel multiple pattern string matching algorithm using Shift OR method is used. The method acts as filter performing approximate string matching . This implies that there are some false matches detected by the filter which requires verification. The proposed method for filtering spams has been developed using a combination of Shift AND and OR operation. The method directly works on spam keywords of equal size whereas for unequal size keywords, a new proposed equal size grouping method is developed. Both method shows improvement over the Aho Corasick algorithm in context of space complexity and also behaves as an efficient filter and reducing the number of false matches as present in Shift OR method.

References

  • Leena Salmela, J. Tarhio and J. Kytojoki "MultiPattern String Matching with Very Large Pattern Sets", ACM Journal of Experimental Algorithmics, Volume 11, 2006. Rajesh Prasad, Anuj Kumar Sharma and Alok Singh.
  • Rajesh Prasad, Anuj Kumar Sharma and Alok Singh. "Efficient bit parallel multiple Pattern approximate string matching algorithm", Academic Journals , Scientific Research and Essays, Vol(6), pp876-881, February 2011.
  • Gonzalo Navarro and Mathieu Raffinot. A Bit Parallel approach to Suffix Automata : Fast Extended String Matching. In M. Farach (editor), Proc. CPM'98, LNCS 1448. Pages 14-33, 1998.
  • G. Navarro,M. Raffinot, Fast and flexible string matching by combining bit-parallelism and suffix automata,ACM J. Experimental Algorithmics (JEA) 5 (4) (2000).
  • David E. Culler, Jaswinder Pal Singh, Anoop Gupta. Parallel Computer Architecture - A Hardware/Software Approach. Morgan Kaufmann Publishers, 1999. ISBN 1-55860-343-3, pg 15
  • "The Two Percent Solution" by Jim Turley 2002
  • http://en. wikipedia. org/wiki/string_searching_algorthim
  • Baeza-Yates RA, Gonnet GH (1992). A new approach to text searching. Commun. ACM. , 35(10): 74-82.
  • Dylan Mors and Dermot Harnett (2009)
  • Paul Graham (2002) A Plan for Spam