I'm finding that this algorithm is useful in a number of situations. Out in the real world, it is found in Network Intrusion Scanners, grep engines, as well as text processing.
For me, I'm working on an implementation for a RADIUS server for authorizing and
logging calls for a voice over ip service. Some vendor specific attributes don't
have a unique numeric identifier, the identifier is simply a string. Being able to do a
fast match would be nice in this situation. Currently I'm using C++'s unordered_map
to facilitate identification of the string. The Wu Manber algorithm may be a suitable
alternative.
On another front, I subscribe to a real time news service. Passing this news service
through a Multi-Pattern Search Algorithm would make for a processor efficient method of
filtering for news items in which I'm interested.
Wu-Manber
stand on the
shoulders of their Multi Pattern Search predecessors: Aho and
Corasick (with a linear time scanner based upon an automata approach), Commentz-Walter
(who combined Aho-Corasick with the Boyer-Moore string search algorithm), and Baeza-Yates
(with a slightly different combination of Aho-Corasick and Boyer-Moore-Horspool).
For those who are interested, here is a Visual Studio implmentation of Wu Manber's
algorithm. It should be
readily transportable to other platforms with little or no change, other the the _tmain
construct. The implementation relies on C++ Standard Template Library for various
containers. A table driven character comparison approach is used to make it easy
to choose between case sensitive and case insensitive matches.
The next step would be turn it into a template to make it useful on various types of
alphabets. It all needs to be redone a bit in order to accept a 'functor' or a 'delegate'
so that when a match is found, an appropriate routine can be invoked.
There is another paper out there called "An Improved Wu-Manber Multiple Patterns Matching
Algorithm" that purports to be faster. The charts in the paper say it is a bit faster. I'm
wondering if it is 'faster enough' to be of value implementing.