[Purdue-pm] Regex Unification
Mark Senn
mark at ecn.purdue.edu
Mon May 16 08:31:05 PDT 2011
> I have this bank of regexes in my code.
>
> $requests2{ $request }->{ $href->{ accession_id } }
> ->{ library } =~ s/\W/\-/g ;
> $requests2{ $request }->{ $href->{ accession_id } }
> ->{ library } =~ s/_/\-/g ;
> $requests2{ $request }->{ $href->{ accession_id } }
> ->{ library } =~ s/-+/-/g ;
> $requests2{ $request }->{ $href->{ accession_id } }
> ->{ library } =~ s/-$//g ;
>
> I'm thinking that I can simplify this a lot.
>
> Change the \W and _ regexes to \W+ and _+ and you reduce the need for
> the s/-+/-/, so we can reduce it, I think, to
>
> $requests2{ $request }->{ $href->{ accession_id } }
> ->{ library } =~ s/[\W_-]+/\-/g ;
>
> I'd have to bash that regex before I'm comfortable putting it into
> production.
>
> But the last part, getting rid of dashes at the end of a string, can I
> roll that into the bigger regex? I'm not seeing how right now.
I prefer the second of the three solutions below.
#!/usr/local/bin/perl
$s = '!@#$%^&*(){}--__--}{)(*&^%$#@!abc--';
$_ = $S . "\n" . $s . "\n";
s/\W/\-/g;
s/_/\-/g;
s/-+/-/g;
s/-$//g;
print "$_\n";
$s = '!@#$%^&*(){}--__--}{)(*&^%$#@!abc--';
$_ = $S . "\n" . $s . "\n";
s/[\W_]/-/g; # Change nonword or '_' characters to '-' everywhere.
# This will change any newlines to '-'.
s/-+/-/g; # Change two or more consecutive '-' characters
# to one '-' everywhere.
s/-$//; # Delete any '-' at the end of the string.
print "$_\n";
$s = '!@#$%^&*(){}--__--}{)(*&^%$#@!abc--';
$_ = $S . "\n" . $s . "\n";
s/[\W_-]+/-/g; # Change consecutive nonword, '_', or '-' characters
# to '-' everywhere.
# This will change any newlines to '-'.
s/-$//; # Delete any '-' at the end of the string.
print "$_\n";
-mark
More information about the Purdue-pm
mailing list