Phoenix.pm: Parsing comments

Anthony Nemmer edelweiss at qwest.net
Wed Nov 28 17:02:38 CST 2001


If the regex gets too complicated, I'd opt for splitting
the string/document into characters and iterate over them,
keeping track of comment delimiters with flags.  =)

At 03:50 PM 11/28/01 -0700, you wrote:
>Thanks Tim,
>
>I found out later from the guy I was helping that the question I posted
was only part of the problem. This tool needs to go through find the
>three types of comments and replace the comment text with a unique string.
then save the the part that was replaced. Comments of the same
>type can not be nested, and it is guaranteed to not have any weird
mismatched comment brackets. But there can be empty comments, and it needs
>to keep track of those. My solution is listed below in case anybody is
interested. Please let me know if anybody has a better way to do it.
>
>my $p = 1;
>my $repl = 'STR';
>my @save = ();
>my $line = "/*co{mm}ent*/ not {another comment} not (*one more comment*)
not a comment";
>$line =~ s/(\/\*)(.*?)(\*\/)|(\(\*)(.*?)(\*\))|(\{)(.*?)(\})/
>              eval{ push @save, $2||$5||$8||'' ; return
($1||$4||$7).$repl.$p++.($3||$6||$9) }
>          /xesg;
>
>print "$line\n";
>print map "[$_]\n", @save;
>
>Thanks
>Tom
>
>Tim Ayers wrote:
>
>> >>>>> "T" == Thomas Whitney <whitneyt at agcs.com> writes:
>> T> Hi Group,
>> T> I am helping somebody write a simple comment parser. "{}" comments
can be inside "/**/" comments, and there could be an empty comment.
>>
>> I don't understand exactly. What is the comment delimiter? {}? /**/?
>> From your code below it looks like comments can be delimited by /**/,
>> {}, or even (**). [ Ed: Why does anyone need 3 kinds of balanced
>> comment delimiters? That makes a hard problem even harder. ]
>>
>> Read "perldoc -q balance". This is a hard problem. Here are a couple
>> examples why
>>
>> /* the comment end-delimiter is */ */
>> /* /* nested comment */ */
>>
>> If you want something that always works look at the Parse::RecDescent
>> module.
>>
>> T> Below is an attempt at it. It appears to works except the |ed
>> T> expressions return empty.  I could probably do it with a few lines,
>> T> but does anybody have any ideas for a better one liner?
>>
>> I've been trying to write a slick way that works when there isn't any
>> monkey business, but I haven't found it yet. In the meantime you can
>> fix yours with a little filtering. Not elegant, but it works.
>>
>>   $_ = "/*co{mm}ent*/";
>>   my @save =  grep /\S/,
>>                 m%(?:/\*(.*?)\*/ |
>>                      {(.*?)}     |
>>                      \(\*(.*?)\*\))%xsg;
>>   print "[$_]\n" for @save;
>>
>> HTH and
>> Hope you have a very nice day, :-)
>> Tim Ayers (tim.ayers at reuters.com)
>
>
>




More information about the Phoenix-pm mailing list