[Pdx-pm] searching for multiple strings

Michael G Schwern schwern at pobox.com
Mon May 16 13:55:04 PDT 2005


On Mon, May 16, 2005 at 01:16:44PM -0700, Michael Rasmussen wrote:
> These don't seem to be synonyms:
>   if( (/Illegal user/ || /User unk/ || /no such user/) ) {
>   if(/Illegal user|User unk|no such user/) {
> 
> Not only are they not synonymous, the performance difference is huge:

They should be synonmous in terms of functionality.


> Um,  what's going on here?

Not sure but could be the regex doing pathological backtracking.  This is
the sort of thing that changes from perl version to perl version.  What
version are you using?

The former is likely quick because its basically doing this:

	index($_, "Illegal user") ||
	index($_, "User unk")     ||
        index($_, "no such user")

No backtracking, just a zippy substring search.  That's my guess.  You 
can hop in with "use re 'debug'" and see what's going on in the regex.


> Secondary question, both methods look wrong to me.  As in there has to be 
> a better way to do the search.  Especially when I'll eventually have N 
> substrings to search for, some of them pulled from a config file specified 
> by the user.

Off the top of my head...

	use List::Util qw(first);

	my @Wrong_User_Pats = ('Illegal User', 'User unk', 'no such user');

	if( first { index($msg, $_) } @Wrong_User_Pats ) {
		...
	}

first() being like grep() except it stops at the first match.

And, of course...

	sub is_wrong_user {
		my $msg = shift;
		return first { index($msg, $_) } @Wrong_User_Pats ? 1 : 0;
	}

	if( is_wrong_user($msg) ) {
		...
	}


-- 
Michael G Schwern     schwern at pobox.com     http://www.pobox.com/~schwern
Don't try the paranormal until you know what's normal.
	-- "Lords and Ladies" by Terry Prachett


More information about the Pdx-pm-list mailing list