[boulder.pm] problem with HTML::LinkExtor and <applet>

rise rise at ipopros.com
Thu Sep 14 16:19:29 CDT 2000


> Hi, (Jonathan|Jon|John|rise),

s/\|John//i or (after 'use English;') all of the above with John being the
only depracated tag (though still legal).
 
> The format is that of a relative URI.

Agreed.
 
[URI spec on subject of handling]
[HTML spec: $codebase->isa('URI') returns true]

The combination of those two pretty much cinches it.

> So all that spec-quoting means, to me, that there's no reason for
> HTML::Parser to be mishandling <applet> URIs:  attribute specs nail
> them down unambiguously.

Agreed. I wonder if someone involved in HTML::Parser failed to check the
same assumption I made.
 
> I like your suggestion about doing two passes (the second pass would need to
> identify the <applet> stuff, re-create the bogus URLs, delete them from the
> data structures where they'd been added, and then generate and insert the
> *correct* URLs.  I may go that route.  I'm just lazy, that's all . . .

It's certainly not pretty. 

> I may wind up heaving this onto CLPM and seeing what happens.  I can't
> believe I'm the first person to be dealing with this.

That might be the best move. After reviewing the spec it looks to me like
code and codebase are _two_ URIs, the latter of which happens to be used
to override the default base URI for the former. HTML::Parser is correctly
parsing this as a pair of URIs but HTML::LinkExtor is not then using the
codebase to rewrite the code URI with the base. This is most likely
because HTML::LinkExtor is registering callbacks for code and codebase
separately. If that's the case then it sees the two as two completely
different links (which is the behaviour you get).

I'm going to poke into the HTML::LinkExtor source to see if I can find a
callback. I'm starting to wonder if this is the expected behaviour (though
the far more annoying one from your point of view) since you lose a little
bit of information when you combine the two URIs.

--

Jonathan Conway			Remember, it's always darkest when you've 
Senior DBA 			decremented to #000000 and you're about to
ipoPros.com/TheStreet.com	overflow.








More information about the Boulder-pm mailing list