[tpm] I wish I was better at regex's

Rob Janes janes.rob at gmail.com
Wed Mar 9 11:46:53 PST 2011


================
line: "key=\'stuff\'" = "value1=\'stuff\',value2=\'more ;stuff\'" ; comment
matches!
data is "key=\'stuff\'" = "value1=\'stuff\',value2=\'more ;stuff\'"
comment is comment

There's a key (in double quotes), an =, a value (in double quotes),
and a comment.

I don't understand.  looks to me like it did pull off the comment properly.

On Wed, Mar 9, 2011 at 2:43 PM, Shaun Fryer <sfryer at sourcery.ca> wrote:
> nice (though pretty convoluted). the following breaks it.
>
> "key=\'stuff\'" = "value1=\'stuff\',value2=\'more ;stuff\'" ; comment
> with " ' and ;
> --
> Shaun Fryer
> cell: 1-647-709-6509
> voip: 1-647-723-2729
>
>
>
>
> On Wed, Mar 9, 2011 at 2:38 PM, Rob Janes <janes.rob at gmail.com> wrote:
>> this strips off the comment
>>
>> #!/usr/bin/env perl
>>
>> use strict;
>> use warnings;
>>
>> while (<DATA>) {
>>  chomp;
>>
>>  print "\n\n================\nline: $_\n";
>>
>>  if (m{
>>        ^
>>        (
>>          (?:[^\\"';\s]|
>>            \\.|
>>            \s|
>>            "(?:[^"\\]|\\.)*"|
>>            '(?:[^'\\]|\\.)*'
>>          )+
>>        )
>>        (?:\s*;\s*(.*))?
>>        $
>>        }x)
>>  {
>>    my ($words, $comment) = ($1, $2);
>>    $comment = "" unless defined($comment);
>>    print "matches!\n";
>>
>>    print "data is $words\ncomment is $comment\n";
>>  }
>>  else
>>  {
>>     print "match FAILED!\n";
>>  }
>> }
>>
>> __DATA__
>> key="value"
>> key=value
>> key="value;"
>> key="value1;value2"
>> key="value1;value2"     ; comment
>> key='value1;value2'     ; comment
>> "key"="value1"
>> "key"="value1"          ; comment
>> "key"="value1;value2"
>> "key"="value1;value2"   ; comment
>> "key"="val\"ue1;value2"
>> "key"="val\"ue1;value2" ; comment
>> "key"='val\'ue1;value2' ; comment
>> "key"='val\"ue1;value2' ; comment
>> key="this=that" ; an = in the value
>> key="value" ; a " in the comment
>> this is a title   ; and this is a comment
>>
>>
>> On Wed, Mar 9, 2011 at 2:33 PM, Shaun Fryer <sfryer at sourcery.ca> wrote:
>>> Indeed. Or "key=\'stuff\'"="value1=\'stuff\',value2=\'more ;stuff\'" ;
>>> comment with " ' and ;
>>> --
>>> Shaun Fryer
>>> cell: 1-647-709-6509
>>> voip: 1-647-723-2729
>>>
>>>
>>>
>>>
>>> On Wed, Mar 9, 2011 at 2:30 PM, Rob Janes <janes.rob at gmail.com> wrote:
>>>> there's
>>>>
>>>> key="this=that"  ; an = in the value
>>>>
>>>> and
>>>>
>>>> key="value" ; a " in the comment
>>>>
>>>> On Wed, Mar 9, 2011 at 2:16 PM, Shaun Fryer <sfryer at sourcery.ca> wrote:
>>>>> my $strip = qr{;[^\;]+$};
>>>>> while (<DATA>) {
>>>>>    chomp;
>>>>>    my ($key, $val) = split /=/;
>>>>>    my ($quote) = ($val =~ m{^(["'])}g);
>>>>>    if ($quote) {
>>>>>        ($val) = ($val =~ m{^($quote[^\b]+($quote))}g);
>>>>>    }
>>>>>    else {
>>>>>        $val =~ s/$strip//;
>>>>>    }
>>>>>    print $key, '=', $val, "\n";
>>>>> }
>>>>>
>>>>> __DATA__
>>>>> key="value"
>>>>> key=value
>>>>> key="vlue;"
>>>>> key="value1;value2"
>>>>> key="value1;value2"     ; comment
>>>>> key='value1;value2'     ; comment
>>>>> "key"="value1"
>>>>> "key"="value1"          ; comment
>>>>> "key"="value1;value2"
>>>>> "key"="value1;value2"   ; comment
>>>>> "key"="val\"ue1;value2"
>>>>> "key"="val\"ue1;value2" ; comment
>>>>> "key"='val\'ue1;value2' ; comment
>>>>> "key"='val\"ue1;value2' ; comment
>>>>> --
>>>>> Shaun Fryer
>>>>> cell: 1-647-709-6509
>>>>> voip: 1-647-723-2729
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 9, 2011 at 2:12 PM,  <daniel at benoy.name> wrote:
>>>>>> Doesn't work.  Output:
>>>>>>
>>>>>> ----
>>>>>> key="value"
>>>>>> key=value
>>>>>> key="value1
>>>>>> key="value1;value2"
>>>>>> key='value1;value2'
>>>>>> "key"="value1"
>>>>>> "key"="value1"
>>>>>> "key"="value1
>>>>>> "key"="value1;value2"
>>>>>> "key"="val\"ue1
>>>>>> "key"="val\"ue1;value2"
>>>>>> "key"='val\'ue1;value2'
>>>>>> "key"='val\"ue1;value2'
>>>>>> ----
>>>>>>
>>>>>> Look at line 3.
>>>>>>
>>>>>> Also it wouldn't catch a trailing semicolon with nothing after it.
>>>>>>
>>>>>> Here's a quick and dirty improvement, but it will still have problems:
>>>>>>
>>>>>> my $strip = qr{;[^\;\"\']*$};
>>>>>> while (<DATA>) {
>>>>>>    chomp;
>>>>>>    $_ =~ s/$strip//;
>>>>>>    print $_, "\n";
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Here's the way I would do it:
>>>>>>
>>>>>> ----
>>>>>> while (<DATA>) {
>>>>>>    chomp;
>>>>>>
>>>>>>    my $stripped;
>>>>>>    my $quotechar = "";
>>>>>>    foreach my $char (split(//, $_)) {
>>>>>>        if ($quotechar) { # We're currently quoted
>>>>>>            if ($char eq $quotechar) { # end of quote
>>>>>>                $quotechar = "";
>>>>>>            }
>>>>>>        } else { # We're not currently quoted
>>>>>>            if ($char eq ';') { # The comment has begun!
>>>>>>                last();
>>>>>>            } elsif ($char eq '"' || $char eq "'") { # start of quote
>>>>>>                $quotechar = $char;
>>>>>>            }
>>>>>>        }
>>>>>>        $stripped .= $char;
>>>>>>    }
>>>>>>    print "$stripped\n";
>>>>>> }
>>>>>>
>>>>>> __DATA__
>>>>>> key="value"
>>>>>> key=value
>>>>>> key="value1;value2"
>>>>>> key="value1;value2"     ; comment
>>>>>> key='value1;value2'     ; comment
>>>>>> "key"="value1"
>>>>>> "key"="value1"          ; comment
>>>>>> "key"="value1;value2"
>>>>>> "key"="value1;value2"   ; comment
>>>>>> "key"="val\"ue1;value2"
>>>>>> "key"="val\"ue1;value2" ; comment
>>>>>> "key"='val\'ue1;value2' ; comment
>>>>>> "key"='val\"ue1;value2' ; comment
>>>>>> ----
>>>>>>
>>>>>> On Wed, 9 Mar 2011 13:48:39 -0500, Shaun Fryer wrote:
>>>>>>>
>>>>>>> my $strip = qr{;[^\;]+$};
>>>>>>> while (<DATA>) {
>>>>>>>    chomp;
>>>>>>>    $_ =~ s/$strip//;
>>>>>>>    print $_, "\n";
>>>>>>> }
>>>>>>>
>>>>>>> __DATA__
>>>>>>> key="value"
>>>>>>> key=value
>>>>>>> key="value1;value2"
>>>>>>> key="value1;value2"     ; comment
>>>>>>> key='value1;value2'     ; comment
>>>>>>> "key"="value1"
>>>>>>> "key"="value1"          ; comment
>>>>>>> "key"="value1;value2"
>>>>>>> "key"="value1;value2"   ; comment
>>>>>>> "key"="val\"ue1;value2"
>>>>>>> "key"="val\"ue1;value2" ; comment
>>>>>>> "key"='val\'ue1;value2' ; comment
>>>>>>> "key"='val\"ue1;value2' ; comment
>>>>>>>
>>>>>>> --
>>>>>>> Shaun Fryer
>>>>>>> cell: 1-647-709-6509
>>>>>>> voip: 1-647-723-2729
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 9, 2011 at 1:39 PM, Fulko Hew <fulko.hew at gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Wed, Mar 9, 2011 at 1:14 PM, Rob Janes <janes.rob at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> here's one that dequotes the key and value ...
>>>>>>>>
>>>>>>>> Thanks for the ideas, but..
>>>>>>>> the issue isn't with extracting the keys and the values (and dequoting
>>>>>>>> them),
>>>>>>>> the task was only to strip trailing comments (while obeying quoted
>>>>>>>> strings)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> toronto-pm mailing list
>>>>>>>> toronto-pm at pm.org
>>>>>>>> http://mail.pm.org/mailman/listinfo/toronto-pm
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> toronto-pm mailing list
>>>>>>> toronto-pm at pm.org
>>>>>>> http://mail.pm.org/mailman/listinfo/toronto-pm
>>>>>>
>>>>>> _______________________________________________
>>>>>> toronto-pm mailing list
>>>>>> toronto-pm at pm.org
>>>>>> http://mail.pm.org/mailman/listinfo/toronto-pm
>>>>>>
>>>>> _______________________________________________
>>>>> toronto-pm mailing list
>>>>> toronto-pm at pm.org
>>>>> http://mail.pm.org/mailman/listinfo/toronto-pm
>>>>>
>>>>
>>>
>>
>


More information about the toronto-pm mailing list