Algorithmic Improvements in Pattern Matching

L. Dudás

doi:10.15866/ireit.v6i1.16102

Algorithmic Improvements in Pattern Matching

^(*) Corresponding author

DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract

This paper shows two algorithmic improvements for pattern matching introducing two new heuristics. For test purposes the author implemented three well-known heuristics to ensure the O(n) worst case time, the O(n log( (m) /m) average case time and the O(n/m) best case time of searching for an m length pattern in an n length text that use a ( letter alphabet. The first new heuristic predicts the better search direction in the pattern preprocessing phase taking into consideration the unsymmetrical property of the pattern and using it in the search process. This method assumes that the full text resides in the RAM and exploits the fact that the memory reading is not direction dependent. The paper gives the prediction function for uniform character distribution and for natural language text alike and in addition presents a generalized form. The second new heuristic is effective on natural language texts and achieves a longer average jump than the pattern length, making compromise on completeness of search. The method exploits the fact that minimal distances can be determined between occurrences of different substrings for a language by a statistical preprocessing. Tests proved that both heuristics result in fewer jumps and character comparisons in the search phase of the pattern matching.
Copyright © 2018 Praise Worthy Prize - All rights reserved.

Keywords

Pattern Matching; Search Direction Prediction; DNA Search; Natural Language Text; Long Jumps

Full Text:

PDF

References

D. E. Knuth, J. H. Morris Jr, V. R. Pratt, Fast pattern matching in strings, SIAM Journal on Computing, Vol. 6, n. 2, pp.323-350, 1977.

C. Allauzen, M. Crochemore, M Raffinot, Efficient exprimental string matching by weak factor recognition, Proceedings of the 12th Conference on Combinatorial Pattern Matching (CPM'01), Vol. 2089, pp. 51-72, 2001.

R. S. Boyer, J. S. Moore, A fast string searching algorithm,
Communications of the Association for Computing Machinery, Vol. 20, n.10, pp. 762-772, 1977.

R. A. Baeza-Yates, G. Navarro, Text Searching: Theory and Practice, In C. Martin-Vide, V. Mitrana, G. Paun (Eds.): Formal Languages and Applications, (Berlin: Springer-Verlag, 2004, 565-597).

W. Ng. Chung, Inexact Pattern Matching Algorithms via Automata, BioChem 218. p.12, March 19, 2007.

G. Navarro, M. Raffinot, Fast and flexible string matching by combining bit parallelism and suffix automata, ACM Journal of Experimental Algorithms, Vol. 5, n. 4, pp. 1-36, 2000.

R .N. Hoorspool, Practical fast searching in strings, Software-Practice and Experience, Vol. 10, n. 3, pp. 501-506, 1980.

R. A. Baeza-Yates, Improved string searching, Software-Practice and Experience, Vol.19, n. 3, pp. 257-271, 1989.

Elixir Technologies Corporation, DesignPro Form Editor (Version 1.5, Xerox, 2004)

P. D.Michailidis, K. G. Margaritis,. On-line string matching algorithms: Survey and experimental results, International Journal of Computer Mathematics, Vol. 76, pp. 411-434, 2001.

European Parliament Proceedings Parallel Corpus 1996-2003 http://www.statmt.org/europarl/

Refbacks

There are currently no refbacks.

Username
Password
Remember me