[tpm] I wish I was better at regex's

Rob Janes janes.rob at gmail.com
Wed Mar 9 11:38:05 PST 2011


this strips off the comment

#!/usr/bin/env perl

use strict;
use warnings;

while (<DATA>) {
  chomp;

  print "\n\n================\nline: $_\n";

  if (m{
        ^
        (
          (?:[^\\"';\s]|
            \\.|
            \s|
            "(?:[^"\\]|\\.)*"|
            '(?:[^'\\]|\\.)*'
          )+
        )
        (?:\s*;\s*(.*))?
        $
        }x)
  {
    my ($words, $comment) = ($1, $2);
    $comment = "" unless defined($comment);
    print "matches!\n";

    print "data is $words\ncomment is $comment\n";
  }
  else
  {
     print "match FAILED!\n";
  }
}

__DATA__
key="value"
key=value
key="value;"
key="value1;value2"
key="value1;value2"     ; comment
key='value1;value2'     ; comment
"key"="value1"
"key"="value1"          ; comment
"key"="value1;value2"
"key"="value1;value2"   ; comment
"key"="val\"ue1;value2"
"key"="val\"ue1;value2" ; comment
"key"='val\'ue1;value2' ; comment
"key"='val\"ue1;value2' ; comment
key="this=that" ; an = in the value
key="value" ; a " in the comment
this is a title   ; and this is a comment


On Wed, Mar 9, 2011 at 2:33 PM, Shaun Fryer <sfryer at sourcery.ca> wrote:
> Indeed. Or "key=\'stuff\'"="value1=\'stuff\',value2=\'more ;stuff\'" ;
> comment with " ' and ;
> --
> Shaun Fryer
> cell: 1-647-709-6509
> voip: 1-647-723-2729
>
>
>
>
> On Wed, Mar 9, 2011 at 2:30 PM, Rob Janes <janes.rob at gmail.com> wrote:
>> there's
>>
>> key="this=that"  ; an = in the value
>>
>> and
>>
>> key="value" ; a " in the comment
>>
>> On Wed, Mar 9, 2011 at 2:16 PM, Shaun Fryer <sfryer at sourcery.ca> wrote:
>>> my $strip = qr{;[^\;]+$};
>>> while (<DATA>) {
>>>    chomp;
>>>    my ($key, $val) = split /=/;
>>>    my ($quote) = ($val =~ m{^(["'])}g);
>>>    if ($quote) {
>>>        ($val) = ($val =~ m{^($quote[^\b]+($quote))}g);
>>>    }
>>>    else {
>>>        $val =~ s/$strip//;
>>>    }
>>>    print $key, '=', $val, "\n";
>>> }
>>>
>>> __DATA__
>>> key="value"
>>> key=value
>>> key="vlue;"
>>> key="value1;value2"
>>> key="value1;value2"     ; comment
>>> key='value1;value2'     ; comment
>>> "key"="value1"
>>> "key"="value1"          ; comment
>>> "key"="value1;value2"
>>> "key"="value1;value2"   ; comment
>>> "key"="val\"ue1;value2"
>>> "key"="val\"ue1;value2" ; comment
>>> "key"='val\'ue1;value2' ; comment
>>> "key"='val\"ue1;value2' ; comment
>>> --
>>> Shaun Fryer
>>> cell: 1-647-709-6509
>>> voip: 1-647-723-2729
>>>
>>>
>>>
>>>
>>> On Wed, Mar 9, 2011 at 2:12 PM,  <daniel at benoy.name> wrote:
>>>> Doesn't work.  Output:
>>>>
>>>> ----
>>>> key="value"
>>>> key=value
>>>> key="value1
>>>> key="value1;value2"
>>>> key='value1;value2'
>>>> "key"="value1"
>>>> "key"="value1"
>>>> "key"="value1
>>>> "key"="value1;value2"
>>>> "key"="val\"ue1
>>>> "key"="val\"ue1;value2"
>>>> "key"='val\'ue1;value2'
>>>> "key"='val\"ue1;value2'
>>>> ----
>>>>
>>>> Look at line 3.
>>>>
>>>> Also it wouldn't catch a trailing semicolon with nothing after it.
>>>>
>>>> Here's a quick and dirty improvement, but it will still have problems:
>>>>
>>>> my $strip = qr{;[^\;\"\']*$};
>>>> while (<DATA>) {
>>>>    chomp;
>>>>    $_ =~ s/$strip//;
>>>>    print $_, "\n";
>>>> }
>>>>
>>>>
>>>> Here's the way I would do it:
>>>>
>>>> ----
>>>> while (<DATA>) {
>>>>    chomp;
>>>>
>>>>    my $stripped;
>>>>    my $quotechar = "";
>>>>    foreach my $char (split(//, $_)) {
>>>>        if ($quotechar) { # We're currently quoted
>>>>            if ($char eq $quotechar) { # end of quote
>>>>                $quotechar = "";
>>>>            }
>>>>        } else { # We're not currently quoted
>>>>            if ($char eq ';') { # The comment has begun!
>>>>                last();
>>>>            } elsif ($char eq '"' || $char eq "'") { # start of quote
>>>>                $quotechar = $char;
>>>>            }
>>>>        }
>>>>        $stripped .= $char;
>>>>    }
>>>>    print "$stripped\n";
>>>> }
>>>>
>>>> __DATA__
>>>> key="value"
>>>> key=value
>>>> key="value1;value2"
>>>> key="value1;value2"     ; comment
>>>> key='value1;value2'     ; comment
>>>> "key"="value1"
>>>> "key"="value1"          ; comment
>>>> "key"="value1;value2"
>>>> "key"="value1;value2"   ; comment
>>>> "key"="val\"ue1;value2"
>>>> "key"="val\"ue1;value2" ; comment
>>>> "key"='val\'ue1;value2' ; comment
>>>> "key"='val\"ue1;value2' ; comment
>>>> ----
>>>>
>>>> On Wed, 9 Mar 2011 13:48:39 -0500, Shaun Fryer wrote:
>>>>>
>>>>> my $strip = qr{;[^\;]+$};
>>>>> while (<DATA>) {
>>>>>    chomp;
>>>>>    $_ =~ s/$strip//;
>>>>>    print $_, "\n";
>>>>> }
>>>>>
>>>>> __DATA__
>>>>> key="value"
>>>>> key=value
>>>>> key="value1;value2"
>>>>> key="value1;value2"     ; comment
>>>>> key='value1;value2'     ; comment
>>>>> "key"="value1"
>>>>> "key"="value1"          ; comment
>>>>> "key"="value1;value2"
>>>>> "key"="value1;value2"   ; comment
>>>>> "key"="val\"ue1;value2"
>>>>> "key"="val\"ue1;value2" ; comment
>>>>> "key"='val\'ue1;value2' ; comment
>>>>> "key"='val\"ue1;value2' ; comment
>>>>>
>>>>> --
>>>>> Shaun Fryer
>>>>> cell: 1-647-709-6509
>>>>> voip: 1-647-723-2729
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 9, 2011 at 1:39 PM, Fulko Hew <fulko.hew at gmail.com> wrote:
>>>>>>
>>>>>> On Wed, Mar 9, 2011 at 1:14 PM, Rob Janes <janes.rob at gmail.com> wrote:
>>>>>>>
>>>>>>> here's one that dequotes the key and value ...
>>>>>>
>>>>>> Thanks for the ideas, but..
>>>>>> the issue isn't with extracting the keys and the values (and dequoting
>>>>>> them),
>>>>>> the task was only to strip trailing comments (while obeying quoted
>>>>>> strings)
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> toronto-pm mailing list
>>>>>> toronto-pm at pm.org
>>>>>> http://mail.pm.org/mailman/listinfo/toronto-pm
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> toronto-pm mailing list
>>>>> toronto-pm at pm.org
>>>>> http://mail.pm.org/mailman/listinfo/toronto-pm
>>>>
>>>> _______________________________________________
>>>> toronto-pm mailing list
>>>> toronto-pm at pm.org
>>>> http://mail.pm.org/mailman/listinfo/toronto-pm
>>>>
>>> _______________________________________________
>>> toronto-pm mailing list
>>> toronto-pm at pm.org
>>> http://mail.pm.org/mailman/listinfo/toronto-pm
>>>
>>
>


More information about the toronto-pm mailing list