Phoenix.pm: Parsing comments

Thomas Whitney whitneyt at agcs.com
Wed Nov 28 16:50:50 CST 2001


Thanks Tim,

I found out later from the guy I was helping that the question I posted was only part of the problem. This tool needs to go through find the
three types of comments and replace the comment text with a unique string. then save the the part that was replaced. Comments of the same
type can not be nested, and it is guaranteed to not have any weird mismatched comment brackets. But there can be empty comments, and it needs
to keep track of those. My solution is listed below in case anybody is interested. Please let me know if anybody has a better way to do it.

my $p = 1;
my $repl = 'STR';
my @save = ();
my $line = "/*co{mm}ent*/ not {another comment} not (*one more comment*) not a comment";
$line =~ s/(\/\*)(.*?)(\*\/)|(\(\*)(.*?)(\*\))|(\{)(.*?)(\})/
              eval{ push @save, $2||$5||$8||'' ; return ($1||$4||$7).$repl.$p++.($3||$6||$9) }
          /xesg;

print "$line\n";
print map "[$_]\n", @save;

Thanks
Tom

Tim Ayers wrote:

> >>>>> "T" == Thomas Whitney <whitneyt at agcs.com> writes:
> T> Hi Group,
> T> I am helping somebody write a simple comment parser. "{}" comments can be inside "/**/" comments, and there could be an empty comment.
>
> I don't understand exactly. What is the comment delimiter? {}? /**/?
> From your code below it looks like comments can be delimited by /**/,
> {}, or even (**). [ Ed: Why does anyone need 3 kinds of balanced
> comment delimiters? That makes a hard problem even harder. ]
>
> Read "perldoc -q balance". This is a hard problem. Here are a couple
> examples why
>
> /* the comment end-delimiter is */ */
> /* /* nested comment */ */
>
> If you want something that always works look at the Parse::RecDescent
> module.
>
> T> Below is an attempt at it. It appears to works except the |ed
> T> expressions return empty.  I could probably do it with a few lines,
> T> but does anybody have any ideas for a better one liner?
>
> I've been trying to write a slick way that works when there isn't any
> monkey business, but I haven't found it yet. In the meantime you can
> fix yours with a little filtering. Not elegant, but it works.
>
>   $_ = "/*co{mm}ent*/";
>   my @save =  grep /\S/,
>                 m%(?:/\*(.*?)\*/ |
>                      {(.*?)}     |
>                      \(\*(.*?)\*\))%xsg;
>   print "[$_]\n" for @save;
>
> HTH and
> Hope you have a very nice day, :-)
> Tim Ayers (tim.ayers at reuters.com)




More information about the Phoenix-pm mailing list