HTML::TreeBuilder, Tidy.exe
Daniel Chetlin
daniel at chetlin.com
Mon Dec 4 02:45:11 CST 2000
[ Finally started to catch up with email, news, and work backlog created
from Thanksgiving vacation. Sorry for the tardiness. ]
On Thu, Nov 16, 2000 at 01:20:28PM -0800, Jeff Zucker wrote:
> Daniel, thanks for a great talk the other night.
My pleasure.
[snip]
> Here's a snippet that will change the base href if one exists, or
> insert one if none exists.
[snip]
> Is this how you'd do it?
>
> sub insert_base {
> my($html_string,$new_URI) = @_;
> use HTML::TreeBuilder;
> my $tree = HTML::TreeBuilder->new;
> $tree->parse($html_string);
> $tree->eof;
> my $head = $tree->look_down('_tag','head');
> my $base = $tree->look_down('_tag','base')
> || $head->new('base');
> $base->{href} = $new_URI;
> $head->push_content($base);
> $html_string = $tree->as_HTML;
> $tree->delete;
> return($html_string);
> }
I would probably use TokeParser for this task, but if doing it in
TreeBuilder, it would likely look similar to that.
> Interestingly this works regardless of whether the original HTML
> includes a head tag or not, since TreeBuilder seems to insert one if
> none exists.
Yep; that's one of TreeBuilder's interesting quirks^Wfeatures -- it
creates fairly standard/correct HTML where none existed to begin with. I
don't often use TreeBuilder to create HTML, partially because of this;
generally my uses for TreeBuilder are its ability to show me how a
browser would parse HTML, and to help in pulling specific content from a
document.
-dlc
TIMTOWTDI
More information about the Pdx-pm-list
mailing list