[Pdx-pm] searching for multiple strings

Randall Hansen randall at sonofhans.net
Mon May 16 14:08:34 PDT 2005


On May 16, 2005, at 1:16 PM, Michael Rasmussen wrote:

> These don't seem to be synonyms:
>   if( (/Illegal user/ || /User unk/ || /no such user/) ) {
>   if(/Illegal user|User unk|no such user/) {

FWIW, the camel talks about this (pasted below).

r

----
(chapter 5, section 9):

Similarly, you should use Perl's control flow to decide which patterns 
to execute, and which ones to skip. A regular expression is pretty 
smart, but it's smart like a horse. It can get distracted if it sees 
too much at once. So sometimes you have to put blinders onto it. For 
example, you'll recall our earlier example of alternation:

/Gandalf|Saruman|Radagast/

That works as advertised, but not as well as it might, because it 
searches every position in the string for every name before it moves on 
to the next position. Astute readers of The Lord of the Rings will 
recall that, of the three wizards named above, Gandalf is mentioned 
much more frequently than Saruman, and Saruman is mentioned much more 
frequently than Radagast. So it's generally more efficient to use 
Perl's logical operators to do the alternation:

/Gandalf/ || /Saruman/ || /Radagast/

This is yet another way of defeating the "leftmost" policy of the 
Engine. It only searches for Saruman if Gandalf was nowhere to be seen. 
And it only searches for Radagast if Saruman is also absent.

Not only does this change the order in which things are searched, but 
it sometimes allows the regular expression optimizer to work better. 
It's generally easier to optimize searching for a single string than 
for several strings simultaneously. Similarly, anchored searches can 
often be optimized if they're not too complicated.




More information about the Pdx-pm-list mailing list