[pm-h] Question about using HTML::TableExtract
Robert Boone
robo4288 at gmail.com
Mon May 14 14:24:53 PDT 2007
On 5/14/07, Richard Price <raprice at gmail.com> wrote:
> I am an intermediate perl user. I taught myself Perl by reading "Learning
> Perl," with some online tutorials and I have some other reference texts. I
> can generally do what I need to with with Perl, but my code is far from
> elegant. I understand the very basics of object-oriented programming in
> Perl, but I generally need sample code to get started with modules from
> cpan. I am a professor at Rice University and have found Perl to be
> invaluable for extracting data for my research, especially the regular
> expression capabilities of Perl. I have been unable to attend any of the
> monthly meetings, but hope to in the future.
>
> For my current project, I am trying to extract historical financial
> statement data from www.marketwatch.com. The url is
> http://www.marketwatch.com/tools/quotes/financials.asp?symb=ABSD&sid=0&report=2&freq=0.
> I use WWW::Mechanize to download the webpage and then I use
> HTML::TableExtract to extract the text that I need. I want to transpose the
> table at depth=1, count=1 after extracting it so that each year is a row and
> each variable is a column. I have not been able to find any documentation
> on how to extract a column from a table using HTML::TableExtract.
>
> The following simple program downloads the data using WWW::Mechanize and
> extracts the table with HTML::TableExtract and prints the output of each
> row.
>
> #!/usr/bin/perl
>
> use HTML::TableExtract;
> use WWW::Mechanize;
> use strict;
>
> my $marketwatch = WWW::Mechanize->new( autocheck => 1 );
> $marketwatch->get("http://www.marketwatch.com/tools/quotes/financials.asp?symb=ABSD&sid=0&report=2&freq=0
> ");
>
> chomp(my $html = $marketwatch->content);
>
> my $table = HTML::TableExtract->new(keep_html=>0, depth =>
> 1, count => 1, br_translate => 0 );
> $table->parse($html);
>
> foreach my $row ($table->rows) {
> print join("\t", @$row), "\n";
> }
>
> I am not able to figure out how to use the columns method. My intuition
> makes me think it should be something like the following (but my intuition
> is wrong):
>
> foreach my $column ($table->columns) {
> print join("\t", @$column), "\n";
> }
>
> The error message I get says: Can't locate object method "columns" via
> package "HTML::TableExtract". The documentation doesn't shed much light
> (for me anyway). I can see in the code of the module that the columns
> method belongs to HTML::TableExtract::Table, but I can't figure out how to
> use it.
>
> I appreciate any help. For an experienced programmer, I am sure this is
> trivial, but I am the closest thing to a programmer in my department, and I
> don't really have anyone around me that I can get help from.
>
> _______________________________________________
> Houston mailing list
> Houston at pm.org
> http://mail.pm.org/mailman/listinfo/houston
> Website: http://houston.pm.org/
>
It looks like you need to call method columns from a
HTML::TableExtract::Table object and not a HTML::TableExtract object.
>From the docs and your email maybe something like this could get you started:
my $table = HTML::TableExtract->new(keep_html=>0, depth => 1, count =>
1, br_translate => 0 );
$table->parse($html);
my $t = $table->table(1,1);
foreach my $row ($t->columns) {
print join("\t", @$row), "\n";
}
More information about the Houston
mailing list