[oak perl] Regular Expressions

David Fetter david at fetter.org
Wed Mar 10 19:51:16 CST 2004


On Wed, Mar 10, 2004 at 05:10:46PM -0800, Belden Lyman wrote:
> 
> On Wed, 2004-03-10 at 12:55, Tony Stubblebine wrote:
> > Thanks George.
> > 
> > I'm curious to see what sort of regular expressions people are writing, 
> > good and bad. And I'd love to start a discussion on regex style or 
> > technique.
> 
> I wanted to find all 2 letter words in /usr/dict/words, so cut over
> to Perl after finding myself a few pipes deep:
> 
>     $ grep '^..$' /usr/dict/words | grep -v '[A-Z][A-Z]' | grep -i '[aeiouy]'
> 
> So long as I was doing this in Perl, I decided to make the program find
> all N-length words, with N defaulting to 2.
> 
> If this looks like an exercise for someone learning (?=) and (?!),
> there's a reason for that ;)
> 
>     #!/usr/bin/perl -s
>     
>     use strict;
>     use warnings;
>     
>     our $length;
>     $length = 2 unless $length;
>     
>     @ARGV = '/usr/dict/words';
>     
>     print grep {
>                 m/
>                  (?=^[A-Za-z]{$length}$)  # entry must be $length letters long
>                  (?!^[A-Z]+$)             # ignore all-caps words
>                  (?i-:                    # this next part is case insensitive:
>                    (?:                    #    we must have either
>                      [aeiouy].            #      a vowel, then any letter
>                      |                    #      or
>                      .[aeiouy]            #      any letter, then a vowel
>                    )
>                  )
>                 /xo
>                }
>                (<>);
>     
>     __END__

Belden, hats off for the ingenious use of regex, but...I don't quite
get this approach.  Why try to cram it all into one regex?

Here's how I'd do a thing like this.  I suppose I lose obfuscation
points, but it's easy to use, understand, modify, maintain, &c., and
it's bumpin' fast.

#!/usr/bin/perl -wl
use strict;
use warnings;
use Getopt::Long;

my $file = '/usr/dict/words';
my $length = 2;
my $result = GetOptions(
  "length=i" => \$length
, "file=s"   => \$file
);

open F, "<$file" or die "Couldn't open $file: $!\n";
while(<F>) {
    chomp;
    next unless length == $length; # Quickly removes most things we don't want.
    next if $_ eq uc($_);          # No shouting.
    next unless /[aeiouwy]/io;     # cwm is a word.
                                   # more simple tests, if needed.
    print;
}
close F;

Cheers,
D
-- 
David Fetter david at fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!



More information about the Oakland mailing list