[Charlotte.PM] Tidying HTML output / Temp Files
Cory Foy
Cory.Foy at mobilehwy.com
Thu Apr 21 06:23:55 PDT 2005
This is a two-in-one post question. :)
I am working on a little script that works with some HTML being
returned. Because I ultimately need to make XPath queries into it, and
the HTML is not XHTML, I need to tidy it up.
The solution I have was to get the content back, write it to a temp
file, make an external call to Tidy telling it to write back to the
file, and then re-reading in the file. It looks like:
####################
my $out = $mech->content();
$out =~ s/&/&/g;
open TEMP, '>tmp1.1';
print TEMP $out;
close TEMP;
`c:\\perl\\tidy.exe --write-back true --output-xhtml true c:\\perl\\tmp1.1`;
my $file = 'c:\\perl\\tmp1.1';
my $xp = XML::XPath->new(filename => $file);
####################
(by the way, I'm sure my Perl is rusty - just getting back into it after
a while, so syntactic suggestions are welcome too)
So two questions:
1) What is a better way to get a temp file? I don't like the hardcoding
of a file name - it smells to me.
2) Is there a better way to tidy the output so that I don't have to rely
on writing to a temp file which has to be processed by tidy?
Thanks!
Cory
More information about the charlotte
mailing list