[oak perl] Regular Expressions

Tony Stubblebine tonys at oreillynet.com
Thu Mar 11 11:25:52 CST 2004


Belden,

I like how you're using lookahead (and comments and whitespace).

Your third lookahead is checking to make sure there's at least one 
vowel. The alternation is fine for two character words, but doesn't work 
for longer words like "strength," where the first vowel is in position four.

Does anyone know of a word that starts with four consonants?
 
#!/usr/bin/perl -s
     
    use strict;
    use warnings;
     
    our $length;
    $length = 8 unless $length;
 
    # A word can't start with more than three consonants (AFAIK)
    my $max_consonants = $length < 3 ? $length : 3;
     
    @ARGV = '/usr/share/dict/words';
     
    print grep {
                m/
                 (?=^[A-Za-z]{$length}$)  # entry must be $length 
letters long
                 (?!^[A-Z]+$)             # ignore all-caps words
                 (?i:
                                          # Check for a vowel early in the
                                          # word.
                   (?=^[^aeiouy]{0,$max_consonants}
                       [aeiouy] )
                 )
                /xo
               }
 
               (<>);





Belden Lyman wrote:

>On Wed, 2004-03-10 at 12:55, Tony Stubblebine wrote:
>  
>
>>Thanks George.
>>
>>I'm curious to see what sort of regular expressions people are writing, 
>>good and bad. And I'd love to start a discussion on regex style or 
>>technique.
>>
>>    
>>
>
>I wanted to find all 2 letter words in /usr/dict/words, so cut over
>to Perl after finding myself a few pipes deep:
>
>    $ grep '^..$' /usr/dict/words | grep -v '[A-Z][A-Z]' | grep -i '[aeiouy]'
>
>So long as I was doing this in Perl, I decided to make the program find
>all N-length words, with N defaulting to 2.
>
>If this looks like an exercise for someone learning (?=) and (?!),
>there's a reason for that ;)
>
>    #!/usr/bin/perl -s
>    
>    use strict;
>    use warnings;
>    
>    our $length;
>    $length = 2 unless $length;
>    
>    @ARGV = '/usr/dict/words';
>    
>    print grep {
>                m/
>                 (?=^[A-Za-z]{$length}$)  # entry must be $length letters long
>                 (?!^[A-Z]+$)             # ignore all-caps words
>                 (?i-:                    # this next part is case insensitive:
>                   (?:                    #    we must have either
>                     [aeiouy].            #      a vowel, then any letter
>                     |                    #      or
>                     .[aeiouy]            #      any letter, then a vowel
>                   )
>                 )
>                /xo
>               }
>               (<>);
>    
>    __END__
>
>Belden
>
>_______________________________________________
>Oakland mailing list
>Oakland at mail.pm.org
>http://mail.pm.org/mailman/listinfo/oakland
>  
>




More information about the Oakland mailing list