CFP last date
20 May 2024
Reseach Article

Improved Approximate Multiple-Pattern String Matching using Consecutive N-Grams

by Vidya Saikrishna, Sid Ray
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 81 - Number 2
Year of Publication: 2013
Authors: Vidya Saikrishna, Sid Ray
10.5120/13986-1996

Vidya Saikrishna, Sid Ray . Improved Approximate Multiple-Pattern String Matching using Consecutive N-Grams. International Journal of Computer Applications. 81, 2 ( November 2013), 26-31. DOI=10.5120/13986-1996

@article{ 10.5120/13986-1996,
author = { Vidya Saikrishna, Sid Ray },
title = { Improved Approximate Multiple-Pattern String Matching using Consecutive N-Grams },
journal = { International Journal of Computer Applications },
issue_date = { November 2013 },
volume = { 81 },
number = { 2 },
month = { November },
year = { 2013 },
issn = { 0975-8887 },
pages = { 26-31 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume81/number2/13986-1996/ },
doi = { 10.5120/13986-1996 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:55:03.080259+05:30
%A Vidya Saikrishna
%A Sid Ray
%T Improved Approximate Multiple-Pattern String Matching using Consecutive N-Grams
%J International Journal of Computer Applications
%@ 0975-8887
%V 81
%N 2
%P 26-31
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

String matching is to find all the occurrences of a given pattern in a large text, the strings being sequence of characters drawn from finite alphabet set. Multiple-Pattern string matching problem involves detection of all the patterns of the Multiple-Pattern set in the text. Shift OR algorithm which we call as the Standard Shift OR algorithm uses the concept of Bit Parallelism to perform approximate string matching. The algorithm as the name suggests performs approximate string matching which means that it finds out some false matches besides detecting correct matches. In other words the algorithm behaves as a filter. In this paper a modification of the standard Shift OR is proposed to improve the filtering efficiency of the standard Shift OR algorithm using the consecutive N-Grams of the patterns of the multiple-pattern set. The proposed method reads N characters of the text at once as compared to a single character in the standard Shift OR algorithm. The number of false matches reduces besides increasing the speed of matching. Extensive experiments have been performed with the algorithm on text and pattern of variable size and the results are compared with the standard Shift OR algorithm.

References
  1. Leena Salmela, J. Tarhio and J. Kytojoki, "Multiple Pattern String Matching with Q Grams", ACM Journal of Experimental Algorithmics , Vol. 11, Article No. 1. 1, 2006.
  2. Rajesh Prasad, Suneeta Agarwal, Ishadutta Yadav, Bharat Singh "Efficient Bit-Parallel Multi-Patterns String Matching Algorithms for Limited Expression", Compute '10 , Proceedings of Third Annual ACM Bangalore Conference, Article No. 10, 2010.
  3. Heikki Hyyr¨o, Kimmo Fredriksson Gonzalo Navarro, "Increased Bit-Parallelism for Approximate and Multiple String Matching", ACM Journal of Experimental Algorithmics, Vol 10, 2006.
  4. Gonzalo Navarro and Mathieu Raffinot. "A Bit Parallel approach to Suffix Automata :Fast Extended String Matching", In M. Farach (editor), Proc. CPM'98, LNCS 1448. pp. 14-33, 1998.
  5. G. Navarro,M. Raffinot, "Fast and Flexible String Matching by combining Bit-Parallelism and Suffix Automata,ACM J. Experimental Algorithmics (JEA) Vol. 5, Article No. 4 2000.
  6. M. Crochemore et al. , "A Bit-Parallel Suffix Automaton approach for (?, ?)-Matching in Music Retrieval", in Proc. 10th Internat. Symp. On String Processing and Information Retrieval (SPIRE'03), in: Lecture Notes in Computer. Sci. , vol. 2857, pp. 211–223.
  7. R. Baeza-Yates, G. Gonnet, "A New Approach to Text Searching", Comm. ACM 35 (10) pp. 74–82, 1992.
  8. Hannu Peltola and Jorma Tarhio , Alternative Algorithms for Bit-Parallel String Matching, String Processing and Information Retrieval, LNCS 2857 pp. 80-93, 2003.
  9. AHO, A. AND CORASICK, M. 1975. "Efficient String Matching: an aid to Bibliographic Search", Communications of the ACM 18, 6, pp. 333–340, 1975.
  10. HYYR¨O, H. AND NAVARRO, G. 2002. "Faster Bit-Parallel Approximate String Matching", in Proc. 13th Combinatorial Pattern Matching (CPM '02). LNCS 2373. Berlin, Germany, Springer, New York. 203–224, 2002.
  11. Vidya Saikrishna, Akhtar Rasool and Nilay Khare, "Spam Filtering through Multiple Pattern Bit Parallel String Matching Combining Shift AND & OR", International Journal of Computer Applications 61(5):40-45,. Published by Foundation of Computer Science, New York, USA, January 2013.
  12. Vidya Saikrishna, Akhtar Rasool and Nilay Khare, "Time Efficient String Matching Solution for Single and Multiple Pattern using Bit Parallelism", International Journal of Computer Applications 46(6):15-20,. Published by Foundation of Computer Science, New York, USA, May 2012.
Index Terms

Computer Science
Information Sciences

Keywords

String Matching Bit Parallelism Shift OR String Matching N-Grams Automaton