[tpm] I wish I was better at regex's

Rob Janes janes.rob at gmail.com
Wed Mar 9 10:14:58 PST 2011


here's one that dequotes the key and value ...

#!/usr/bin/env perl

use strict;
use warnings;

sub dequote($) {
  my $quoted = $_[0];
  my $dequoted = "";
  my $counter = 0;

  while ($quoted =~ m{
        (
          (?:[^=\\"';\s]|
            \\.|
            "(?:[^"\\]|\\.)*"|
            '(?:[^'\\]|\\.)*'
          )
        )
        }gx)
  {
    my $piece = $1;
    if (substr($piece, 0, 1) eq '\\') {
      $dequoted .= substr($piece, 1, 1);
    } elsif (substr($piece, 0, 1) eq '"' || substr($piece, 0, 1,) eq "'") {
      my $temp = substr($piece, 1, -1);
      $temp =~ s/\\(.)/$1/g;
      $dequoted .= $temp;
    } else {
      $dequoted .= $piece;
    }

    $counter++;
  }

  $_[0] = $dequoted if $counter;
  return $counter;
}

while (<>) {
  chomp;

  print "\n\n================\nline: $_\n";

  if (m{
        ^
        (
          (?:[^=\\"';\s]|
            \\.|
            "(?:[^"\\]|\\.)*"|
            '(?:[^'\\]|\\.)*'
          )+
        )
         \s* = \s*
        (
          (?:[^=\\"';\s]|
            \\.|
            "(?:[^"\\]|\\.)*"|
            '(?:[^'\\]|\\.)*'
            )+
        )
        (?:\s*;\s*(.*))?
        $
        }x)
  {
    my ($key, $value, $comment) = ($1, $2, $3);
    print "matches!\n";

    dequote($key);
    dequote($value);
    print "key is $key\nvalue is $value\ncomment is $comment\n";
  }
  else
  {
     print "match FAILED!\n";
  }
}

and the output is ...

================
line: key="value"
matches!
key is key
value is value
comment is


================
line: key=value
matches!
key is key
value is value
comment is


================
line: key="value1;value2"
matches!
key is key
value is value1;value2
comment is


================
line: key="value1;value2"     ; comment
matches!
key is key
value is value1;value2
comment is comment


================
line: key='value1;value2'     ; comment
matches!
key is key
value is value1;value2
comment is comment


================
line: "key"="value1"
matches!
key is key
value is value1
comment is


================
line: "key"="value1"          ; comment
matches!
key is key
value is value1
comment is comment


================
line: "key"="value1;value2"
matches!
key is key
value is value1;value2
comment is


================
line: "key"="value1;value2"   ; comment
matches!
key is key
value is value1;value2
comment is comment


================
line: "key"="val\"ue1;value2"
matches!
key is key
value is val"ue1;value2
comment is


================
line: "key"="val\"ue1;value2" ; comment
matches!
key is key
value is val"ue1;value2
comment is comment


================
line: "key"='val\'ue1;value2' ; comment
matches!
key is key
value is val'ue1;value2
comment is comment


================
line: "key"='val\"ue1;value2' ; comment
matches!
key is key
value is val"ue1;value2
comment is comment


On Wed, Mar 9, 2011 at 12:07 PM, Rob Janes <janes.rob at gmail.com> wrote:
> seems to work ...
>
> #!/usr/bin/env perl
>
> while (<>) {
>  chomp;
>
>  print "\n\n================\nline: $_\n";
>
>  if (m{
>        ^
>        (
>          (?:[^=\\"';\s]|
>            \\.|
>            "(?:[^"\\]|\\.)*"|
>            '(?:[^'\\]|\\.)*'
>          )+
>        )
>         \s* = \s*
>        (
>          (?:[^=\\"';\s]|
>            \\.|
>            "(?:[^"\\]|\\.)*"|
>            '(?:[^'\\]|\\.)*'
>            )+
>        )
>        (?:\s*;\s*(.*))?
>        $
>        }x)
>  {
>     print "matches!\n";
>     print "key is $1\nvalue is $2\ncomment is $3\n";
>  }
>  else
>  {
>     print "match FAILED!\n";
>  }
> }
>
> results:
>
> ================
> line: key="value"
> matches!
> key is key
> value is "value"
> comment is
>
>
> ================
> line: key=value
> matches!
> key is key
> value is value
> comment is
>
>
> ================
> line: key="value1;value2"
> matches!
> key is key
> value is "value1;value2"
> comment is
>
>
> ================
> line: key="value1;value2"     ; comment
> matches!
> key is key
> value is "value1;value2"
> comment is comment
>
>
> ================
> line: key='value1;value2'     ; comment
> matches!
> key is key
> value is 'value1;value2'
> comment is comment
>
>
> ================
> line: "key"="value1"
> matches!
> key is "key"
> value is "value1"
> comment is
>
>
> ================
> line: "key"="value1"          ; comment
> matches!
> key is "key"
> value is "value1"
> comment is comment
>
>
> ================
> line: "key"="value1;value2"
> matches!
> key is "key"
> value is "value1;value2"
> comment is
>
>
> ================
> line: "key"="value1;value2"   ; comment
> matches!
> key is "key"
> value is "value1;value2"
> comment is comment
>
>
> ================
> line: "key"="val\"ue1;value2"
> matches!
> key is "key"
> value is "val\"ue1;value2"
> comment is
>
>
> ================
> line: "key"="val\"ue1;value2" ; comment
> matches!
> key is "key"
> value is "val\"ue1;value2"
> comment is comment
>
>
> ================
> line: "key"='val\'ue1;value2' ; comment
> matches!
> key is "key"
> value is 'val\'ue1;value2'
> comment is comment
>
>
> ================
> line: "key"='val\"ue1;value2' ; comment
> matches!
> key is "key"
> value is 'val\"ue1;value2'
> comment is comment
>
>
> On Wed, Mar 9, 2011 at 11:15 AM, Fulko Hew <fulko.hew at gmail.com> wrote:
>> Today's regex problem is a standard problem, that has standard answers,
>> I just don't know them, and I haven't found them...
>>
>> "Strip comments from lines while obeying quoted strings".
>>
>> I've got a fellow who's dealing with 'registry files', but needs to
>> strip comments off of the end of the lines (I.e. anything after a semicolon)
>> but preserve semicolons (and text) if found within a quoted string.
>>
>> I know there are hard ways to do it, but there's got to be a module
>> to handle it for me.
>>
>> Here are some example strings:
>>
>> key="value"
>> key=value
>> key="value1;value2"
>> key="value1;value2"     ; comment
>> key='value1;value2'     ; comment
>> "key"="value1"
>> "key"="value1"          ; comment
>> "key"="value1;value2"
>> "key"="value1;value2"   ; comment
>> "key"="val\"ue1;value2"
>> "key"="val\"ue1;value2" ; comment
>> "key"='val\'ue1;value2' ; comment
>> "key"='val\"ue1;value2' ; comment
>>
>>
>> Suggestions?
>>
>> Fulko
>>
>>
>> _______________________________________________
>> toronto-pm mailing list
>> toronto-pm at pm.org
>> http://mail.pm.org/mailman/listinfo/toronto-pm
>>
>>
>


More information about the toronto-pm mailing list