Efficient Generation of Super Condensed Neighborhoods

Luís M. S. Russo, Arlindo L. Oliveira


Abstract:

Indexing methods for the approximate string matching problem spend a considerable effort generating condensed neighborhoods. Condensed neighborhoods, however, are not a minimal representation of a pattern neighborhood. Super condensed neighborhoods, proposed in this work, are smaller, provably minimal and can be used to locate approximate matches that can later be extended by on-line search. We present an algorithm for generating Super Condensed Neighborhoods. The algorithm can be implemented either by using dynamic programming or non-deterministic automata. The complexity is $ O(m s)$ for the first case and $ O(k m s)$ for the second, where $ m$ is the pattern size, $ s$ is the size of the super condensed neighborhood and $ k$ the number of errors. Previous algorithms depended on the size of the condensed neighborhood instead. These algorithms can be implemented using Bit-Parallelism and Increased Bit-Parallelism techniques. Our experimental results show that the resulting algorithms are fast and achieve significant speedups, when compared with the existing proposals that use condensed neighborhoods.





Luis Manuel Silveira Russo 2007-12-02