[Pdx-pm] searching for multiple strings
Randall Hansen
randall at sonofhans.net
Mon May 16 14:08:34 PDT 2005
On May 16, 2005, at 1:16 PM, Michael Rasmussen wrote:
> These don't seem to be synonyms:
> if( (/Illegal user/ || /User unk/ || /no such user/) ) {
> if(/Illegal user|User unk|no such user/) {
FWIW, the camel talks about this (pasted below).
r
----
(chapter 5, section 9):
Similarly, you should use Perl's control flow to decide which patterns
to execute, and which ones to skip. A regular expression is pretty
smart, but it's smart like a horse. It can get distracted if it sees
too much at once. So sometimes you have to put blinders onto it. For
example, you'll recall our earlier example of alternation:
/Gandalf|Saruman|Radagast/
That works as advertised, but not as well as it might, because it
searches every position in the string for every name before it moves on
to the next position. Astute readers of The Lord of the Rings will
recall that, of the three wizards named above, Gandalf is mentioned
much more frequently than Saruman, and Saruman is mentioned much more
frequently than Radagast. So it's generally more efficient to use
Perl's logical operators to do the alternation:
/Gandalf/ || /Saruman/ || /Radagast/
This is yet another way of defeating the "leftmost" policy of the
Engine. It only searches for Saruman if Gandalf was nowhere to be seen.
And it only searches for Radagast if Saruman is also absent.
Not only does this change the order in which things are searched, but
it sometimes allows the regular expression optimizer to work better.
It's generally easier to optimize searching for a single string than
for several strings simultaneously. Similarly, anchored searches can
often be optimized if they're not too complicated.
More information about the Pdx-pm-list
mailing list