[tpm] I wish I was better at regex's

Rob Janes janes.rob at gmail.com
Wed Mar 9 12:36:46 PST 2011


i recall some compsci proof that regex cannot do nested pattern
matching, like (xxx) or (xxx (yyy) zzz).  for that you need a lalr
parser, something like recdescent or whatever.

however, regex can handle single depth quoted expressions.  the regex
might be long and ugly, but it can do it.

so, shell quotation, ms ini file, c quotes, javascript quotes, xml,
can all be parsed by a single regex expression.

On Wed, Mar 9, 2011 at 2:42 PM, Richard Dice <richard.dice at gmail.com> wrote:
> Those are getting closer, but honestly this can go way beyond the realm of what regexps can provide.  For example, the text in the comment area... In theory, it could contain text like
>
>  "key"="Val\"ue" ; tricked you!!
>
> To fully, correctly solve this you need a full parser.  Like Parse::RecDescent
>
> Which is like hitting a melon with an atomic bomb.
>
> If you have ANY foreknowledge of the sorts of files you are going to receive and the kind of comments they will (NOT "might, theoretically") contain, just cook up some cheap-ass line mangling logic, maybe using some regexps, maybe some substr and index and rindex.
>
> Sent from my iPhone
>
> On 2011-03-09, at 2:32 PM, Fulko Hew <fulko.hew at gmail.com> wrote:
>
>> On Wed, Mar 9, 2011 at 2:29 PM, Shaun Fryer <sfryer at sourcery.ca> wrote:
>>>
>>> It's very difficult to do without matching quotes, and hence key/value
>>> pairs. For instance, what if the comment contains a quote, just as the
>>> quotes may contain a semi-colon? In this case I've sidestepped the
>>> problem of escaped quotes by using a word-boundary (\b) match inside
>>> any matched quotes. the side effect of this, is that any spaces inside
>>> the quotes will count...
>>>
>>> good luck if you can do it without matching keys/values/quotes. I've
>>> already got beers riding on the solution. ;)
>>
>> Thats why I then looked at things like:
>>
>> Regexp::Common::balanced
>> Regexp::Common::comment
>>
>> but I couldn't figure out how to apply them.
>> _______________________________________________
>> toronto-pm mailing list
>> toronto-pm at pm.org
>> http://mail.pm.org/mailman/listinfo/toronto-pm
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm
>


More information about the toronto-pm mailing list