LPM: regex snippets

Frank Price fprice at mis.net
Thu Nov 11 01:18:43 CST 1999


Hi lexpm!  I have been dealing with some thorny regex's (thorny for
me, that is) and thought I'd share some.  Sorry this is kindof long.
Interested in all comments ...

First some background: this script is a crontab filter; it takes a
crontab entry (crontab is the *nix facility for automatic job
scheduling) and presents it in a more human readable format.  A
typical entry looks like this:

   1,11,21,31,41,51 1-5,17-23 * * 2 /usr/local/bin/blah

Which means "run /usr/local/bin/blah every Tuesday at 1,11,21,31,41,
and 51 minutes past between 1 and 5 am and also between 5 and 11pm".

Task 1)  Take a string, which may contain commas and also ranges, and
	 return a list of all the numbers.  Ex: for "1-5,9,12" it
	 should yield (1,2,3,4,5,9,12).

Code 1) 
    @list=splitcommas($entry);

    sub splitcommas {
        my ($string) = @_;
     
        if ( $string =~ s/(\d+)(-)(\d+)/join(',', ($1 .. $3))/e ) {
             splitcommas($string);
        } elsif ( $string =~ /,/ ) { # if at least one comma, split on it
             split(',', $string);
        } else { ($string); } # handles no commas or dash case; i.e. single num
     }
     
Comment 1)
  The main work is done with the substitute in the "if".  This pattern
  says "if you find two sets of integers joined by a dash, replace it
  with the inclusive range of those numbers joined by commas".  Then we
  recuse on this fcn to split the commas.  The cool thing about using
  the /e modifier is that the right hand side can be a perl
  expression.  This let me just replace the ranges by a comma
  separated string, and it kept the entries in order.  Took a while to
  get this one :-)


Task 2) Pad all single digits in comma separated string with leading
        zero.  So "1,3,5,10,12" yields "01,03,05,10,12".

Code 2)
   $string =~ s/(,|^)(\d)(?=(?:,|$))/${1}0$2/g;  

Comment 2)
  Still not sure /exactly/ what's going on here!  The lhs says "match
  either a comma or start-of-line; then a single digit; then just look
  ahead to see if the next character is either a comma or
  end-of-line."  Parens around the first two make it remember the
  match.  Then the rhs says "replace that with (comma or start-line)
  followed by 0 followed by the digit."  The key (I think) is that the
  look ahead is what they call zero-width, so it doesn't actually
  increment the pattern matcher's record of where it is in the string.
  That's why I don't have to put the trailing comma/end-string back
  in.

  Another easier way would be to split on commas, pad each number with
  s/^(\d)$/0$1/, and then join again with commas.  TMTOWTDI...


Task 3) Take a list of numbers and change each to the correct cardinal
        (?) representation.  Ex. (1,11,21) yields (1st, 11th, 21st).

Code 3)
   foreach $day (@days) {
      if    ( $day =~ /^1$/ || $day =~ /[^1]1$/ ) { $day .= "st" }
      elsif ( $day =~ /^2$/ || $day =~ /[^1]2$/ ) { $day .= "nd" }
      elsif ( $day =~ /^3$/ || $day =~ /[^1]3$/ ) { $day .= "rd" }
      else                                         { $day .= "th" }
   }

Comment 3)
   This is the one I mentioned at the meeting.  Someone suggested a
   hash and that's a good solution in this case; but maybe not if the
   range gets bigger than 30 numbers!  I would have like to put the
   match into one regex but thought it might slow it down.  Here the
   logic is "if the number is a 1, or ends in a non-1 and then a 1,
   add an "st" to it."  So on for 2 and 3; everything else falls thru
   to the "th" case.  It is important to have the two disjuncts in
   that order, I think.


Thanks for listening, and please tell me if you see better ways to do
any of this!

-Frank.
--
Frank Price
fprice at mis.net

sub splitcommas {
   # take string with possible commas and ranges (-)
   # rtns array consisting of elts
   # E.g., hours entry = "0-6,9,12,15,17,19-23"
   # gives @=(0,1,2,3,4,5,6,9,12,15,17,19,20,21,22,23);
   my ($string) = @_;

   # This pattern says "if you find two sets of integers joined by a
   # dash, replace it with the inclusive range of those numbers joined
   # by commas".  Then we recuse on this fcn to split the commas
   if ( $string =~ s/(\d+)(-)(\d+)/join(',', ($1 .. $3))/e ) {
        splitcommas($string);
   } elsif ( $string =~ /,/ ) { # if at least one comma, split on it
        split(',', $string);
   } else { ($string); } # handles no commas or dash case; i.e. single num
}





More information about the Lexington-pm mailing list