SPUG: The Regexp of Infinitude.

David Bitseff dbitsef at uswest.com
Mon Aug 9 13:04:22 CDT 1999


Okay, my other post reguarding this did not really help and was not
even in the same thread.  I'll try not to do that again.  

Ken pointed out to me that the '*?' metacharacter sequence is indeed
valid and not redundant.  It's the 'match 0 or more non-greedy'
operator.  This metacharacter may be specific to perl.  I haven't had
much of a use for this as I am a very greedy person.  

Here is my first stab at a working solution:

m,{#(#*[^\\#]+(:?(:?\\#/}|\\[^\\]|#[^/]|#/[^}])[^\\#]*#*)*)#/},

I know it looks ugly, but the idea is preatty simple.  First match the 
opening comment.  Mabey the opening comment will have some '#'
following.  Then match anything that is not special.  The characters
'#' and '\' are special as they symbolize the beginning of special
things like an ending comment or an escaped thing (like an escaped
ending comment).  Try to get as many of those as possible.  If one of
the special things is found, use the sequence of OR'ed items in the
middle to find out how to match the special thing.  After the special
thing, try to get more of the regular type of thing.  You can do the
special/regular matching more than once.  Then there may be some '#'
before the ending comment.  Then match the ending comment.

This will not get everything.  I think that a sequence of '\' will
fail a match.  Most escaped things should match.  You should be able
to use the same general idea for matching text surrounded by some sort 
of delimiter.\


>>>>> On VI Aug MCMXCIX XV:XXXIII:XXVIII, Ken McGlothlen writes:

Ken> For reasons which, I'm afraid, are going to be obscure at the moment, I have
Ken> the following line in my Perl script:

Ken> 	$buffer =~ m#^(\043(.*?[^\\])*?)(\043/\})(.*)#s;

Ken> Basically, this should be interpreted as

Ken> 	Starting with the beginning of a multiline string, a group consisting
Ken> 	of "#" followed by zero or more groups of any characters that do not
Ken> 	end with a backslash, followed by a group consisting exactly of "#/}",
Ken> 	followed by a group containing the rest of the string.

Ken> Another way to look at this would be

Ken> 	Starting with the beginning of a multiline string, a group consisting
Ken> 	of "#" followed by zero or more groups of any characters, followed by a
Ken> 	group consisting exactly of "#/}" which is NOT preceded by a backslash,
Ken> 	followed by a group containing the rest of the string.

Ken> In other words, I'm working on a program which parses a configuration file.
Ken> Its comment syntax (again, for obscure reasons) is

Ken> 	{# comment text #/}

Ken> So I'm working on a state machine that has, at this point of the program,
Ken> already trimmed the beginning brace, and now is trying to collect the rest of
Ken> the comment.  HOWEVER, the following should be legal:

Ken> 	{# End comments with the (ignore the backslash) "\#/}" string #/}

Ken> Seems to me that the regexp should work, but I can't tell---Perl takes pretty
Ken> much forever to run that regexp.

Ken> Is there a more efficient way of writing this, or do I really need to take a
Ken> completely different approach?  (I'd hate to have to do this in lex and yacc.)

Ken> Thanks.

Ken> 							---Ken

Ken> P.S.  Anyone looking for Perl subcontracts?

Ken>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Ken>     POST TO: spug-list at pm.org        PROBLEMS: owner-spug-list at pm.org
Ken>  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
Ken>  SUBSCRIBE/UNSUBSCRIBE: Replace ACTION below by subscribe or unsubscribe
Ken>         Email to majordomo at pm.org: ACTION spug-list your_address


-- 
David Bitseff
U S WEST Creative Services
dbitsef at uswest.com         
(206) 346-9279

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    POST TO: spug-list at pm.org        PROBLEMS: owner-spug-list at pm.org
 Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
 SUBSCRIBE/UNSUBSCRIBE: Replace ACTION below by subscribe or unsubscribe
        Email to majordomo at pm.org: ACTION spug-list your_address





More information about the spug-list mailing list