[pgh-pm] Thoughts on that JavaScript problem [resent]

Tom Moertel tom at moertel.com
Thu Feb 13 20:08:00 CST 2003


[I am resending this because I hosed up the first attempt by sending
from a different account.  Sorry if you see this twice.]

Fellow Perl Folk,

After I got home from last night's meeting, I was thinking about the
original JavaScript problem that was the inspiration for David's JS
parser project.  In short, the problem is that we need to locate JS
statements of the form

    window.location = << string expr >>

and wrap the string expression -- however long and complicated it may be
-- with a function call that will adjust the URL so that it points to
our rewriting-proxy service:

    window.location = adjust_URL( << string expr >> )

The tricky part is that in order to determine the location for the final
parenthesis of the function call, we must parse the string expression
according to JS's baroque grammatical and lexical whims.  Nasty.

What other options do we have?

Here's one option.  Maybe we can get rid of the nastiness by avoiding
the need to wrap the string in the first place.  Let's take a closer
look at adjust_URL.  It seems likely that this function must adjust any
URL passed to it by tacking something on to the front in order to
re-target the URL to the proxy service.  Further, the function must pass
through the original URL (in some form) for the proxy service to fetch
on our behalf.  So, given the original URL

    http://original.com/page.html

we want to rewrite it into something like

    http://my.proxy.com/do-proxy/http://original.com/page.html

This being the case, adjust_URL must look something like the following:

    function adjust_URL(url) {
        return "http://my.proxy.com/do-proxy/" + url;
    }

Note that the only adjustment we strictly require is to tack something
onto the front of the original URL.  With this in mind, let's return to
the original JS statement that we want to adjust:

    window.location = << string expr >>

In this statement, it's easy to locate the assignment operator:  just
find the equals sign.  We also know that the string expression,
regardless of how nasty it is, must immediately follow the assignment
operator.  Therefore, at the right-hand side of the (=), we can safely
insert any expression fragment that requires a string expression as its
right-hand side:

    window.location = "http://my.proxy.com/do-proxy/"
                    + << string expr >>

So, assuming that the original statement was something like

    window.location = base + "/page4.html"

our modification will yield

    window.location = "http://my.proxy.com/do-proxy/"
                    + base + "/page4.html"

which is exactly what we want.  Yay!  We win, right?

Not quite. The problem is that the assignment operator (=) has loose
binding under precedence rules, and we're substituting the (+) operator,
which binds more tightly.  If our original string expression contains an
operator that binds more loosely than (+), precedence will cause our
substitution to break.  Consider this original statement:

    window.location = page == 1 ? "page2.html" : "page1.html"

Applying our modification will yield

    window.location = "http://my.proxy.com/do-proxy/"
                    + page == 1 ? "page2.html" : "page1.html"

Which unfortunately parses like so:

    window.location = ( (("http://my.proxy.com/do-proxy/" + page) == 1)
                        ? "page2.html" : "page1.html" )

Oops.  Not only did we change the meaning of the test in the (?)
operator, we failed to rewrite its result, so we're hosed twice.

But we're not dead yet.

One solution to the precedence problem is to use parentheses to wrap and
protect the original expression.  After all, that's what parentheses are
for.  Unfortunately, this solution puts us right back where we started: 
having to parse the string expression to find out where to put the
closing parenthesis.

Luckily, there is a better way.  Instead of using (+) in our
substitution, we can use (+=).  This operator has the same precedence as
the original assignment (=), so we don't need to parenthesize the
original expression.  The only trick is that (+=) must have a modifiable
reference as its left-hand side.  That's doable.  

Let's see how it works.  Given

    window.location = << sting expr >>

we rewrite it as

    window.location = MYBASE += << sting expr >>

where MYBASE has been set to "http://myproxy.com/do-proxy" beforehand.

Let's apply this technique to our troublesome example with the (?)
operator and see if it works.  Given

    window.location = page == 1 ? "page2.html" : "page1.html"

we rewrite it as

    window.location = MYBASE += page == 1 ? "page2.html" : "page1.html"

which parses like so:

    window.location = ( MYBASE
                        += (page == 1 ? "page2.html" : "page1.html") )

This is exactly what we want!

The only new problem we introduce is having to ensure that MYBASE is
initialized correctly, but that's an easy one to solve.  We could simply
insert the initialization code into the start of the HTML document's
HEAD element.  Since we're already rewriting the HTML as part of our
proxy service, this extra modification is trivial.

And that solves the puzzle.  I think.

Can anybody spot a flaw in this approach?

Cheers,
Tom







More information about the pgh-pm mailing list