[Pdx-pm] searching for multiple strings

Wil Cooley wcooley at nakedape.cc
Mon May 16 14:10:18 PDT 2005


Also Sprach Michael Rasmussen <mikeraz at patch.com> on Mon, May 16, 2005 at 01:16:44PM PDT:
> I'm working on a little thing to watch log files for patterns and then
> modify firewall rules based on items found.
> 
> During benchmarking I came across something that said my internal Perl
> interpreter does not know what's going on.
> 
> These don't seem to be synonyms:
>   if( (/Illegal user/ || /User unk/ || /no such user/) ) {
>   if(/Illegal user|User unk|no such user/) {
> 
> Not only are they not synonymous, the performance difference is huge:
> 
>   [root at tire log]# ./tben maillog.1
>   42613 lines to process
>   Benchmark: timing 10 iterations of allinone, seps...
>     allinone: 71 wallclock secs (68.61 usr +  0.30 sys = 68.91 CPU) @  0.15/s (n=10)
>     seps:      3 wallclock secs ( 2.85 usr +  0.00 sys =  2.85 CPU) @  3.51/s (n=10)
>   found seps 36980  allinone 36980
> 
> 
> Um,  what's going on here?

The first method performs 3 regular expression matches for every line,
whereas the second only one.  That great of a timing difference is hard
to account for though.  Performance differences aside, how are they not
synonymous?

> Secondary question, both methods look wrong to me.  As in there has
> to be a better way to do the search.  Especially when I'll eventually
> have N substrings to search for, some of them pulled from a config
> file specified by the user.

You want something that works like 'grep -f <patternfile>'?  It isn't
difficult to slurp in a file, chomp each line, append to a string,
append '|' (if not the last), then use the resulting string as the RE.
You probably also want to compile the RE with 'qr//'.

Wil
-- 
Wil Cooley                                 wcooley at nakedape.cc
Naked Ape Consulting                        http://nakedape.cc
* * * * Linux, UNIX, Networking and Security Solutions * * * *
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/pdx-pm-list/attachments/20050516/251fbc04/attachment.bin


More information about the Pdx-pm-list mailing list