[Charlotte.PM] Tidying HTML output / Temp Files

Cory Foy Cory.Foy at mobilehwy.com
Thu Apr 21 06:23:55 PDT 2005


This is a two-in-one post question. :)

I am working on a little script that works with some HTML being 
returned. Because I ultimately need to make XPath queries into it, and 
the HTML is not XHTML, I need to tidy it up.

The solution I have was to get the content back, write it to a temp 
file, make an external call to Tidy telling it to write back to the 
file, and then re-reading in the file. It looks like:

####################
my $out = $mech->content();
$out =~ s/&/&/g;

open TEMP, '>tmp1.1';
print TEMP $out;
close TEMP;

`c:\\perl\\tidy.exe --write-back true --output-xhtml true c:\\perl\\tmp1.1`;

my $file = 'c:\\perl\\tmp1.1';
my $xp = XML::XPath->new(filename => $file);
####################

(by the way, I'm sure my Perl is rusty - just getting back into it after 
a while, so syntactic suggestions are welcome too)

So two questions:

1) What is a better way to get a temp file? I don't like the hardcoding 
of a file name - it smells to me.

2) Is there a better way to tidy the output so that I don't have to rely 
on writing to a temp file which has to be processed by tidy?

Thanks!

Cory



More information about the charlotte mailing list