[Phoenix-pm] Perl grammar for Perl5 -> Perl6

Larry Wall larry at wall.org
Thu Dec 8 09:11:59 PST 2005


On Thu, Dec 08, 2005 at 11:51:52AM +0200, Yuval Kogman wrote:
: On Wed, Dec 07, 2005 at 16:48:11 -0500, Peter Schwenn wrote:
: > Dear Perl6 Language,
: > 
: > I am Perl user from near year 0.  For me the easiest way to learn (,
: > track, and get to the point of contributing to) Perl6 would be a Perl
: > grammar (a regex rule set in, preferably, Perl6) that transforms any
: > Perl5 script into a Perl6.  Of couse, besides learning Perl6 for a
: > regex'r or Perl5'r such as myself, and tracking, and documenting 6, it
: > would have huge use for Perl5 users making or considering the
: > transition.
: 
: IMHO machine translation is not that good a way to start learning -
: the real benefit of Perl 6 is in the features which have no perl 5
: equivalents and solve problems much more elegantly.

Except it would be lovely to have a smart enough refactoring translator
that it could recognize where those elegant solutions are possible and
at least give the option of attempting them.  Or at least a hint that
there might be a better way.

: The best thing to do is to hang out on #perl6 and get involved with
: the test suite, as well as reading the synopses.
: 
: Perhaps writing a toy program or something like that could also
: help.

Sure, but some of our toys are bigger than others.  :-)

: > Is there such a Perl5->Perl6 translator underway?
: 
: Larry Wall is working on using the perl (5) interpreter to create
: compiled output (as opposed to just something that executes in
: memory) that can then be read by a translator without actually
: parsing perl 5.

Yes, I have a version of 5.9.2 that dumps out some *very* strange
XML that represents, as closely as possible, the exact meaning of
the code to Perl 5, along with all the syntactic bits.  I then filter
that strange XML back into something approximating an AST.  I am in
the process of proving to myself that I'm getting enough information
out of this to recreate the original Perl 5, so I jokingly call this
my Perl5-to-Perl5 translator.  As of today, I'm able to translate
76.57% of the t/*/*.t files that come with the Perl distribution.
Considering that last week this number was down at about 5%, it would
seem that I've been making a lot of progress.  But most of the work
went into that first 5%, and a lot of work will likely go into the
last 5% as well.  To get that first 5% I basically had to completely
refactor the lexer and the grammar without changing anything, which
is of course impossible.  The Perl 5 parser forgets or misplaces an
astounding variety of information that the translator needs, and
you can't just go and tell it to turn off the optimizations, because
in fact most of those optimizations are deeply interwingled with
semantic analysis and transformations as well.

Basically, every skipspace() in toke.c and every op_free() in op.c
and every rule reduction in perly.y loses necessary information.
To attempt to do what I'm currently doing you would have to be
completely insane like me.  It's a total nightmare.  If I were
Catholic I'd be hoping this all counts as pennance for my past sins,
and gets me out of 100 million years of Purgatory or so.  But being
a Protestant, I'm merely repenting of my past sins, and thinking
about maybe repenting my future ones.

And if I were Jewish I'd've said "Oy vey" many times over.  :-)

Anyway, once I get to 100% of the t/ files, I'll make it translate
all of CPAN back to itself.  And at some point I'll take a first
whack at the Perl5-to-Perl6 translator, then open it up for community
participation.  It's still just a bit too early for that, though,
because there's such a delicate interplay between refactoring bits
of perl without changing anything vs trying to guess whether we are
getting enough type and structural information out to recreate the
original in the backend.  There are already more than 10000 lines of
code in the backend just to undo the damage done by the Perl 5 engine.

: Before this happens this will be very very hard - the high level
: language has vast amounts of implications on execution etc, but the
: opcode tree is much more simpler to predict (for a computer).

Right.  But my intent is to write a really good translator, and that
implies that it has to be a multi-level translation.  That involves
keeping track of all the subtle semantic and pragmatic information
as well as the basic syntactic information.  Otherwise we might as
well just feed Perl 5 to babblefish and see if Perl 6 comes out...

: > p.s. The developing form of such a grammar could likely lead to
: > a grammar package which facilitates rule sets for languages in
: > other domains, in terms of illuminating means of choosing among modes
: > for rule ordering, collecting, scoping, re-application, recursion, exclusion and so forth.
: 
: Since perl 5's actual parser and tokenizer will be used for this it
: won't be very extensible, but this is important because perl is
: reaaaaaaaaaaaaaaaaaaaaaaaaallly hard to parse.

And it's oh about 20 times harder to tell if you've parsed it
correctly from a semantic point of view.  Consider that every little
instrumentation tweak I've made to the lexer has had about a 50%
chance of inducing strange distortions in the meaning of "Perl".
I would be completely lost without the existing regression tests.
That is why I went for minimal instrumentation and try to undo most
of the damage in back end.  (Or I guess it'll be a "middle end"
once we start translating to Perl 6.)

As for the original question, I think that the Perl 6 grammar will
be a much better example for how to parse other languages than a
Perl 5 grammar would be, since one of the underlying design currents
from the beginning has been that Perl 6 had to be a language that
was amenable to parsing by Perl 6 rules (with a little help from a
bottom-up operator-precedence expression parser.)

Larry


More information about the Phoenix-pm mailing list