[oak perl] Regular Expressions

Thu Mar 11 13:25:35 CST 2004

On Thu, Mar 11, 2004 at 08:52:52AM -0800, Belden Lyman wrote:
> On Wed, 2004-03-10 at 17:51, David Fetter wrote:
> > On Wed, Mar 10, 2004 at 05:10:46PM -0800, Belden Lyman wrote:
> >
> > Belden, hats off for the ingenious use of regex, but...I don't
> > quite get this approach.  Why try to cram it all into one regex?
> > 
> 
> To prove to myself that it can all be done in one regex.

Cool :)

> > Here's how I'd do a thing like this.  I suppose I lose obfuscation
> > points, but it's easy to use, understand, modify, maintain, &c.,
> > and it's bumpin' fast.
> 
> Sure, I wouldn't use anything like the above (err, the snipped?)

heh

> in production code, exactly for the reasons you mentioned.  It was
> an exercise, not much more.

Roight.

> > #!/usr/bin/perl -wl
> > use strict;
> > use warnings;
> > use Getopt::Long;
> > 
> > my $file = '/usr/dict/words';
> > my $length = 2;
> > my $result = GetOptions(
> >   "length=i" => \$length
> > , "file=s"   => \$file
> > );
> > 
> > open F, "<$file" or die "Couldn't open $file: $!\n";
> > while(<F>) {
> >     chomp;
>       next unless /^\w+$/;           # ignore contractions
> >     next unless length == $length; # Quickly removes most things we don't want.
> >     next if $_ eq uc($_);          # No shouting.
> >     next unless /[aeiouwy]/io;     # cwm is a word.
> >                                    # more simple tests, if needed.
> >     print;
> > }
> > close F;
> 
> Benchmarking certainly upholds your claim of bumpin' fast!

:)

I suspect it might be even faster if you put the length test first.
The algorithmic principle for mine is that it discards 1st--think of
an increasingly fine-meshed set of filters on a water intake.  First,
something that excludes furniture, then something that excludes
bottles & cans, then bits of paper, then sand, then volatile organics,
sulfur...

Cheers,
D
-- 
David Fetter david at fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!