<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Sorry to chip in late, but this actually feels like a tokenizing
problem, which is part-way to Richard's point. I do a lot of these,
and there is a pattern in the perldocs, specifically under "What
good is \G in a regular expression?" in perlfaq6. This would go
something like this (*** warning untested code ***)<br>
<br>
while(1) {<br>
m{\G(\s*;[^\n]*))}gcx && do { }; # Don't print when
matched a comment<br>
m{\G(=)}gcx && do { print $1; };<br>
m{\G(\s+)}gcx && do { print $1; };<br>
m{\G(\w+)}gcx && do { print $1; };<br>
m{\G(\"(?:\\.|[^\\\"])*\")}gcx && do { print $1; };<br>
m{\G(\'(?:\\.|[^\\\'])*\')}gcx && do { print $1; };<br>
m{\G$}gcx && last;<br>
croak("Unprocessed input");<br>
}<br>
<br>
print, of course, could be replaced to just drop the identified
section of text somewhere, e.g., in an output array to be joined.<br>
<br>
This has the benefit that it isn't all one huge regex, but it is
slower. Essentially, the idea is simple: \G represents the current
position, and each line handles a different type of token at each
position. This allows strings to be handled in separate regexes from
words, comments, etc. This means comment handling can be separated
from quote handling, which does improve maintainability. <br>
<br>
As has been said, it is possible to do this in a single regex (even
nesting in Perl 5.10+) but the result can be an unreadable mess.
Believe me, I've written some like that. There is also a significant
risk of hitting serious performance issues. A complex regex, can
quickly degrade if backtracking/lazy quantifiers aren't handled
right, and you can end up with truly bad performance. The approach
above will impose a small hit, but usually prevents pathologically
bad matching. <br>
<br>
All the best<br>
Stuart<br>
<br>
<br>
<br>
On 09/03/2011 4:19 PM, J. Bobby Lopez wrote:
<blockquote
cite="mid:AANLkTi=es+cFw40e9hhi7Ujn1=2-xRJh+=DG9FeUhAQp@mail.gmail.com"
type="cite">I would expect that you can just count the number of
';' instances in the string, and get the index of the last
instance which resides after the last instance of the last single
or double quote. If there are no quotes, then it's the first
instance of ';'.<br>
<br>
<div class="gmail_quote">On Wed, Mar 9, 2011 at 4:14 PM, Uri
Guttman <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:uri@stemsystems.com">uri@stemsystems.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid
rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left:
1ex;">
>>>>> "RJ" == Rob Janes <<a
moz-do-not-send="true" href="mailto:janes.rob@gmail.com">janes.rob@gmail.com</a>>
writes:<br>
<br>
RJ> i recall some compsci proof that regex cannot do
nested pattern<br>
RJ> matching, like (xxx) or (xxx (yyy) zzz). for that you
need a lalr<br>
RJ> parser, something like recdescent or whatever.<br>
<br>
that is true for pure regexes. perl's latest can match nested
pairs. it<br>
isn't trivial but the feature is in there and documented.
regardless,<br>
this problem is very easy to solve with text::balanced and
some basic<br>
code. just a single regex is the wrong solution.<br>
<div class="im"><br>
uri<br>
<br>
--<br>
Uri Guttman ------ <a moz-do-not-send="true"
href="mailto:uri@stemsystems.com">uri@stemsystems.com</a>
-------- <a moz-do-not-send="true"
href="http://www.sysarch.com" target="_blank">http://www.sysarch.com</a>
--<br>
----- Perl Code Review , Architecture, Development,
Training, Support ------<br>
--------- Gourmet Hot Cocoa Mix ---- <a
moz-do-not-send="true" href="http://bestfriendscocoa.com"
target="_blank">http://bestfriendscocoa.com</a> ---------<br>
_______________________________________________<br>
</div>
<div>
<div class="h5">toronto-pm mailing list<br>
<a moz-do-not-send="true" href="mailto:toronto-pm@pm.org">toronto-pm@pm.org</a><br>
<a moz-do-not-send="true"
href="http://mail.pm.org/mailman/listinfo/toronto-pm"
target="_blank">http://mail.pm.org/mailman/listinfo/toronto-pm</a><br>
</div>
</div>
</blockquote>
</div>
<br>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
toronto-pm mailing list
<a class="moz-txt-link-abbreviated" href="mailto:toronto-pm@pm.org">toronto-pm@pm.org</a>
<a class="moz-txt-link-freetext" href="http://mail.pm.org/mailman/listinfo/toronto-pm">http://mail.pm.org/mailman/listinfo/toronto-pm</a>
</pre>
</blockquote>
</body>
</html>