[Purdue-pm] Regex Unification

Mon May 16 08:31:05 PDT 2011

  > I have this bank of regexes in my code.
  > 
  >             $requests2{ $request }->{ $href->{ accession_id } }
  >                 ->{ library } =~ s/\W/\-/g ;
  >             $requests2{ $request }->{ $href->{ accession_id } }
  >                 ->{ library } =~ s/_/\-/g ;
  >             $requests2{ $request }->{ $href->{ accession_id } }
  >                 ->{ library } =~ s/-+/-/g ;
  >             $requests2{ $request }->{ $href->{ accession_id } }
  >                 ->{ library } =~ s/-$//g ;
  > 
  > I'm thinking that I can simplify this a lot.
  > 
  > Change the \W and _ regexes to \W+ and _+ and you reduce the need for
  > the s/-+/-/, so we can reduce it, I think, to
  > 
  >             $requests2{ $request }->{ $href->{ accession_id } }
  >                 ->{ library } =~ s/[\W_-]+/\-/g ;
  > 
  > I'd have to bash that regex before I'm comfortable putting it into
  > production.
  > 
  > But the last part, getting rid of dashes at the end of a string, can I
  > roll that into the bigger regex? I'm not seeing how right now.

I prefer the second of the three solutions below.

#!/usr/local/bin/perl

$s = '!@#$%^&*(){}--__--}{)(*&^%$#@!abc--';
$_ = $S . "\n" . $s . "\n";
s/\W/\-/g;
s/_/\-/g;
s/-+/-/g;
s/-$//g;
print "$_\n";

$s = '!@#$%^&*(){}--__--}{)(*&^%$#@!abc--';
$_ = $S . "\n" . $s . "\n";
s/[\W_]/-/g;  # Change nonword or '_' characters to '-' everywhere.
              # This will change any newlines to '-'.
s/-+/-/g;     # Change two or more consecutive '-' characters
              # to one '-' everywhere.
s/-$//;       # Delete any '-' at the end of the string.
print "$_\n";

$s = '!@#$%^&*(){}--__--}{)(*&^%$#@!abc--';
$_ = $S . "\n" . $s . "\n";
s/[\W_-]+/-/g;  # Change consecutive nonword, '_', or '-'  characters
                # to '-' everywhere.
                # This will change any newlines to '-'.
s/-$//;         # Delete any '-' at the end of the string.
print "$_\n";

-mark