[tpm] I wish I was better at regex's
Rob Janes
janes.rob at gmail.com
Wed Mar 9 10:14:58 PST 2011
here's one that dequotes the key and value ...
#!/usr/bin/env perl
use strict;
use warnings;
sub dequote($) {
my $quoted = $_[0];
my $dequoted = "";
my $counter = 0;
while ($quoted =~ m{
(
(?:[^=\\"';\s]|
\\.|
"(?:[^"\\]|\\.)*"|
'(?:[^'\\]|\\.)*'
)
)
}gx)
{
my $piece = $1;
if (substr($piece, 0, 1) eq '\\') {
$dequoted .= substr($piece, 1, 1);
} elsif (substr($piece, 0, 1) eq '"' || substr($piece, 0, 1,) eq "'") {
my $temp = substr($piece, 1, -1);
$temp =~ s/\\(.)/$1/g;
$dequoted .= $temp;
} else {
$dequoted .= $piece;
}
$counter++;
}
$_[0] = $dequoted if $counter;
return $counter;
}
while (<>) {
chomp;
print "\n\n================\nline: $_\n";
if (m{
^
(
(?:[^=\\"';\s]|
\\.|
"(?:[^"\\]|\\.)*"|
'(?:[^'\\]|\\.)*'
)+
)
\s* = \s*
(
(?:[^=\\"';\s]|
\\.|
"(?:[^"\\]|\\.)*"|
'(?:[^'\\]|\\.)*'
)+
)
(?:\s*;\s*(.*))?
$
}x)
{
my ($key, $value, $comment) = ($1, $2, $3);
print "matches!\n";
dequote($key);
dequote($value);
print "key is $key\nvalue is $value\ncomment is $comment\n";
}
else
{
print "match FAILED!\n";
}
}
and the output is ...
================
line: key="value"
matches!
key is key
value is value
comment is
================
line: key=value
matches!
key is key
value is value
comment is
================
line: key="value1;value2"
matches!
key is key
value is value1;value2
comment is
================
line: key="value1;value2" ; comment
matches!
key is key
value is value1;value2
comment is comment
================
line: key='value1;value2' ; comment
matches!
key is key
value is value1;value2
comment is comment
================
line: "key"="value1"
matches!
key is key
value is value1
comment is
================
line: "key"="value1" ; comment
matches!
key is key
value is value1
comment is comment
================
line: "key"="value1;value2"
matches!
key is key
value is value1;value2
comment is
================
line: "key"="value1;value2" ; comment
matches!
key is key
value is value1;value2
comment is comment
================
line: "key"="val\"ue1;value2"
matches!
key is key
value is val"ue1;value2
comment is
================
line: "key"="val\"ue1;value2" ; comment
matches!
key is key
value is val"ue1;value2
comment is comment
================
line: "key"='val\'ue1;value2' ; comment
matches!
key is key
value is val'ue1;value2
comment is comment
================
line: "key"='val\"ue1;value2' ; comment
matches!
key is key
value is val"ue1;value2
comment is comment
On Wed, Mar 9, 2011 at 12:07 PM, Rob Janes <janes.rob at gmail.com> wrote:
> seems to work ...
>
> #!/usr/bin/env perl
>
> while (<>) {
> chomp;
>
> print "\n\n================\nline: $_\n";
>
> if (m{
> ^
> (
> (?:[^=\\"';\s]|
> \\.|
> "(?:[^"\\]|\\.)*"|
> '(?:[^'\\]|\\.)*'
> )+
> )
> \s* = \s*
> (
> (?:[^=\\"';\s]|
> \\.|
> "(?:[^"\\]|\\.)*"|
> '(?:[^'\\]|\\.)*'
> )+
> )
> (?:\s*;\s*(.*))?
> $
> }x)
> {
> print "matches!\n";
> print "key is $1\nvalue is $2\ncomment is $3\n";
> }
> else
> {
> print "match FAILED!\n";
> }
> }
>
> results:
>
> ================
> line: key="value"
> matches!
> key is key
> value is "value"
> comment is
>
>
> ================
> line: key=value
> matches!
> key is key
> value is value
> comment is
>
>
> ================
> line: key="value1;value2"
> matches!
> key is key
> value is "value1;value2"
> comment is
>
>
> ================
> line: key="value1;value2" ; comment
> matches!
> key is key
> value is "value1;value2"
> comment is comment
>
>
> ================
> line: key='value1;value2' ; comment
> matches!
> key is key
> value is 'value1;value2'
> comment is comment
>
>
> ================
> line: "key"="value1"
> matches!
> key is "key"
> value is "value1"
> comment is
>
>
> ================
> line: "key"="value1" ; comment
> matches!
> key is "key"
> value is "value1"
> comment is comment
>
>
> ================
> line: "key"="value1;value2"
> matches!
> key is "key"
> value is "value1;value2"
> comment is
>
>
> ================
> line: "key"="value1;value2" ; comment
> matches!
> key is "key"
> value is "value1;value2"
> comment is comment
>
>
> ================
> line: "key"="val\"ue1;value2"
> matches!
> key is "key"
> value is "val\"ue1;value2"
> comment is
>
>
> ================
> line: "key"="val\"ue1;value2" ; comment
> matches!
> key is "key"
> value is "val\"ue1;value2"
> comment is comment
>
>
> ================
> line: "key"='val\'ue1;value2' ; comment
> matches!
> key is "key"
> value is 'val\'ue1;value2'
> comment is comment
>
>
> ================
> line: "key"='val\"ue1;value2' ; comment
> matches!
> key is "key"
> value is 'val\"ue1;value2'
> comment is comment
>
>
> On Wed, Mar 9, 2011 at 11:15 AM, Fulko Hew <fulko.hew at gmail.com> wrote:
>> Today's regex problem is a standard problem, that has standard answers,
>> I just don't know them, and I haven't found them...
>>
>> "Strip comments from lines while obeying quoted strings".
>>
>> I've got a fellow who's dealing with 'registry files', but needs to
>> strip comments off of the end of the lines (I.e. anything after a semicolon)
>> but preserve semicolons (and text) if found within a quoted string.
>>
>> I know there are hard ways to do it, but there's got to be a module
>> to handle it for me.
>>
>> Here are some example strings:
>>
>> key="value"
>> key=value
>> key="value1;value2"
>> key="value1;value2" ; comment
>> key='value1;value2' ; comment
>> "key"="value1"
>> "key"="value1" ; comment
>> "key"="value1;value2"
>> "key"="value1;value2" ; comment
>> "key"="val\"ue1;value2"
>> "key"="val\"ue1;value2" ; comment
>> "key"='val\'ue1;value2' ; comment
>> "key"='val\"ue1;value2' ; comment
>>
>>
>> Suggestions?
>>
>> Fulko
>>
>>
>> _______________________________________________
>> toronto-pm mailing list
>> toronto-pm at pm.org
>> http://mail.pm.org/mailman/listinfo/toronto-pm
>>
>>
>
More information about the toronto-pm
mailing list