[boulder.pm] problem with HTML::LinkExtor and <applet>

Thu Sep 14 14:07:01 CDT 2000

> Hi,
> 
> I'm using HTML::LinkExtor to extract links from some HTML documents.
> Works pretty well.

[...]

> but rather 2 links to
> 
>     ./classes
>     Bounce
> 
> 
> Any thoughts, ideas?  Both HTML::LinkExtor and HTML::Parser are up to
> date.

Having poked around a bit I still have no real idea how to fix it, but it
looks to me like the failure mode is that an applet tag isn't a legal URI
and thus HTML::Parser isn't parsing it as a link at all. If that's the
case then 'fixing' HTML::LinkExtor would probably mean breaking
HTML::Parser's standard's conformance. It's hackish, but have you looked
at doing a two pass link extraction: one with LinkExtor and another
looking for applet tags. Since they're tags (even if they turn out to not
be URIs) you can probably pull them out with HTML::Parser itself and
reformat them as a link.

Jonathan Conway
Senior DBA
ipoPros.com/TheStreet.com