HTML::TreeBuilder, Tidy.exe

Masque masque at
Thu Nov 16 16:07:51 CST 2000

Majordomo doesn't seem to like subroutine declarations.  :]  I'm commenting out
the line that caused majordomo to reject this and passing the rest on untouched.

----- Forwarded message from owner-pdx-pm-list at -----

Date: Thu, 16 Nov 2000 16:22:05 -0500 (EST)
From: owner-pdx-pm-list at
To: owner-pdx-pm-list at
Subject: BOUNCE pdx-pm-list at     Admin request of type /^sub\b/i at line 8  

Date: Thu, 16 Nov 2000 13:20:28 -0800
From: Jeff Zucker <jeff at>
X-Mailer: Mozilla 4.7 [en] (Win98; U)
MIME-Version: 1.0
To: Daniel Chetlin <daniel at>
CC: pdx-pm-list at
Subject: HTML::TreeBuilder, Tidy.exe
References: <sa0ead01.009 at> <20001115142918.J314 at> <20001116022715.A999 at>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Daniel, thanks for a great talk the other night.  I've been
experimenting with TreeBuilder.  Here's a snippet that will change the
base href if one exists, or insert one if none exists.  (Not that I ever
use base hrefs, I did it up in response to the clpm user who requested
it, but he was too rude to Randal for me to send it to him.)  Is this
how you'd do it?

# sub insert_base {
    my($html_string,$new_URI) = @_;
    use HTML::TreeBuilder;
    my $tree = HTML::TreeBuilder->new;
    my $head = $tree->look_down('_tag','head');
    my $base = $tree->look_down('_tag','base')
            || $head->new('base');
    $base->{href} = $new_URI;
    $html_string = $tree->as_HTML;

Interestingly this works regardless of whether the original HTML
includes a head tag or not, since TreeBuilder seems to insert one if
none exists. 

Also, I wanted to mention a great resource one might want to use in
conjunction with HTML::Parser or HTML::TreeBuilder -- the w3's tidy.exe
program that does a good job of cleaning up bad HTML and producing XHTML
and several other tasks.


----- End forwarded message -----

More information about the Pdx-pm-list mailing list