Call for Paper - September 2020 Edition
IJCA solicits original research papers for the September 2020 Edition. Last date of manuscript submission is August 20, 2020. Read More

Serial and Parallel Bayesian Spam Filtering using Aho-Corasick and PFAC

International Journal of Computer Applications
© 2013 by IJCA Journal
Volume 74 - Number 17
Year of Publication: 2013
Saima Haseeb
Mahak Motwani
Amit Saxena

Saima Haseeb, Mahak Motwani and Amit Saxena. Article: Serial and Parallel Bayesian Spam Filtering using Aho-Corasick and PFAC. International Journal of Computer Applications 74(17):9-14, July 2013. Full text available. BibTeX

	author = {Saima Haseeb and Mahak Motwani and Amit Saxena},
	title = {Article: Serial and Parallel Bayesian Spam Filtering using Aho-Corasick and PFAC},
	journal = {International Journal of Computer Applications},
	year = {2013},
	volume = {74},
	number = {17},
	pages = {9-14},
	month = {July},
	note = {Full text available}


With the rapid growth of Internet, E-mail, with its convenient and efficient characteristics, has become an important means of communication in people's life. It reduces the cost of communication. It comes with Spam. Spam emails, also known as 'junk e-mails', are unsolicited one's sent in bulk with hidden or forged identity of the sender, address, and header information. It is vital to pursue more effective spam filtering approaches to maintain normal operations of e-mail systems and to protect the interests of email users. In this paper we developed a Spam filter based on Bayesian filtering method using Aho-corasick and PFAC string matching algorithm. This filter developed an improved version of spam filter based on traditional Bayesian spam filtering to improve spam filtering efficiency, and to reduce chances of misjudgement of malignant spam. For further improvement of Spam filtering process we are transform the filter in to parallel spam filter on GPGPU's by using PFAC Algorithm.


  • Wu, Y. L. , "Using Visual Features For Anti-Spam Filtering, "2005 IEEE International Conference on Image Processing (ICIP2005), pp. 509–512, 2005. Postini : Email Monitoring + Email Filtering Blog. http://www. dicontas. co. uk/blog/quick-facts/emailspam-trafficrockets/65/.
  • Toshihiro Tabata, "SPAM mail filtering : commentary of Bayesian filter, " The journal of Information Science and Technology Association, Vol. 56, No. 10, pp. 464-468, 2006.
  • http://www. cs. nmt. edu/~janbob/SPAM, Spam corpus, SMS corpus,
  • http://www. comp. nus. edu. sg/~rpnlpir/downloads/corpora/smsCorpus/
  • Amayri O, Bouguil N (2009). Online Spam Filtering Using Support Vector Machines. IEEE. , pp. 337- 340.
  • C. Pu, S. Webb, O. Kolesnikov, W. Lee, and R. Lipton. Towards the Integration of Diverse Spam Filtering Techniques. In Proc. of IEEE International Conference on Granular Computing, pages 7 – 10, 2006.
  • I. Androutsopoulos and et. , "An Evaluation of Naïve Bayesian Anti-Spam Filtering", 11th EurpoeanConference on Machine Learning, pp 9-17, Barcelona, Spain, June 2000
  • Paul Graham, "Better Bayesian Filter" ,http://www. paulgraham. com/better. htm
  • A. V. Aho and M. J. Corasick, "Efficient String Matching: An aid Bibliographic search". In Communication of the ACM Vol. 18, issues 6, pp. -333-340, 1975.
  • Cheng-Hung Lin and Shih-Chieh-Chang," Efficient pattern matching algorithm for memory architecture", Vol. 19, issue 1, pp. 33-41, January 2011.
  • Chengguo Chang and Hui Wang," Comparison of Two-Dimensional String Matching Algorithms"In the proc. International Conference on Computer Science and Electronics Engineering (ICCSEE), Vol. 3, pp. 608-611,march 2012.
  • Raphael Clifford, Markus Jalsenius, Ely Porat and Benjamin Sach,"Pattern matching in multiple stream", in the proc. 23rd Annual conference on Combinatorial Pattern Matching, pp. 97-109,2012.
  • R. Takahashi, U. Inoue, "Parallel Text Matching Using GPGPU", in the proc. 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), pp. 242-246, Aug. 2012.
  • C. Lin, et al. , "Accelerating String Matching Using Multi-Threaded Algorithm on GPU," Proc. IEEE Global Telecommunications Conf. , pp. 1-5, 2010.
  • J. D. Owens, et al. , "A Survey of General-Purpose Computation on Graphics Hardware," Computer Graphics forum, Vol. 26, No. 1, pp. 80-113, 2007.
  • C. Lin, C. Liu, L. Chien, and S. Chang," Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs", IEEE Transactions on computers, vol. pp, issue 1.
  • ZhaXinyan and S. Sahni," Multipattern string matching on a GPU", In the proc. IEEE conference on Computers and Communications (ISCC), pp. 277-282, July 2011.
  • Tran Nhat-Phuong, Lee Myungho, Hong Sugwon and Minho Shin," Memory Efficient Parallelization for Aho-Corasick Algorithm on a GPU", IEEE 14th International Conference on High Performance Computing and Communication, pp. 432-438, June 2012.
  • Jungwon Kim, Honggyu Kim, Joo Hwan Lee and Jaejin Lee," Achieving a single compute device image in OpenCL for multiple GPUs", Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, pp. 277-288,2011.
  • NVIDIA, "CUDA Best Practices Guide: NVIDIA CUDA C Programming Best Practices Guide – CUDA Toolkit 4. 0", May, 2011
  • Xinyan Zha and Sartaj Sahni," GPU-to-GPU and Host-to-Host Multipattern String Matching on a GPU", IEEE Transactions on Computers, Volume 62, Issue 6, pp. 1156-1169,2013
  • J. E. Stone, D. Gohara, and G. Shi, "OpenCl: A parallel programming standard for heterogeneous computing systems, "Computing in Science Engineering,vol. 12,no. 3,pp. 66-73,2010.
  • HyeranJeon, Xia Yinglong and V. K. Prasanna," Parallel Exact Inference on a CPU-GPGPU Heterogeneous System", In the proc. 39th International Conference on parallel Processing (ICPP), pp. 61-70,Sept. 2010.
  • Liang Hu, CheXilong and XieZhenzhen,"GPGPU cloud: A paradigm for general purpose computing", Tsinghua Science and Technology, Vol. 18, issue 1, pp. 22-23, Feb. 2013.
  • M. C. Schatz and C. Trapnell, "Fast Exact String Matching on the GPU," Technical report