HTML::TreeBuilder, Tidy.exe

Daniel Chetlin daniel at chetlin.com
Mon Dec 4 02:45:11 CST 2000


[ Finally started to catch up with email, news, and work backlog created
from Thanksgiving vacation. Sorry for the tardiness. ]

On Thu, Nov 16, 2000 at 01:20:28PM -0800, Jeff Zucker wrote:
> Daniel, thanks for a great talk the other night.

My pleasure.

[snip]
> Here's a snippet that will change the base href if one exists, or
> insert one if none exists.
[snip]
> Is this how you'd do it?
> 
> sub insert_base {
>     my($html_string,$new_URI) = @_;
>     use HTML::TreeBuilder;
>     my $tree = HTML::TreeBuilder->new;
>     $tree->parse($html_string);
>     $tree->eof;
>     my $head = $tree->look_down('_tag','head');
>     my $base = $tree->look_down('_tag','base')
>             || $head->new('base');
>     $base->{href} = $new_URI;
>     $head->push_content($base);
>     $html_string = $tree->as_HTML;
>     $tree->delete;
>     return($html_string);
> }

I would probably use TokeParser for this task, but if doing it in
TreeBuilder, it would likely look similar to that.

> Interestingly this works regardless of whether the original HTML
> includes a head tag or not, since TreeBuilder seems to insert one if
> none exists. 

Yep; that's one of TreeBuilder's interesting quirks^Wfeatures -- it
creates fairly standard/correct HTML where none existed to begin with. I
don't often use TreeBuilder to create HTML, partially because of this;
generally my uses for TreeBuilder are its ability to show me how a
browser would parse HTML, and to help in pulling specific content from a
document.

-dlc
TIMTOWTDI



More information about the Pdx-pm-list mailing list