[pm-h] Question about using HTML::TableExtract

Richard Price raprice at gmail.com
Mon May 14 13:14:36 PDT 2007


I am an intermediate perl user.  I taught myself Perl by reading "Learning
Perl," with some online tutorials and I have some other reference texts.  I
can generally do what I need to with with Perl, but my code is far from
elegant.  I understand the very basics of object-oriented programming in
Perl, but I generally need sample code to get started with modules from
cpan.  I am a professor at Rice University and have found Perl to be
invaluable for extracting data for my research, especially the regular
expression capabilities of Perl.  I have been unable to attend any of the
monthly meetings, but hope to in the future.

For my current project, I am trying to extract historical financial
statement data from www.marketwatch.com.  The url is
http://www.marketwatch.com/tools/quotes/financials.asp?symb=ABSD&sid=0&report=2&freq=0.
I use WWW::Mechanize to download the webpage and then I use
HTML::TableExtract to extract the text that I need.  I want to transpose the
table at depth=1, count=1 after extracting it so that each year is a row and
each variable is a column.  I have not been able to find any documentation
on how to extract a column from a table using HTML::TableExtract.

The following simple program downloads the data using WWW::Mechanize and
extracts the table with HTML::TableExtract and prints the output of each
row.

#!/usr/bin/perl

use HTML::TableExtract;
use WWW::Mechanize;
use strict;

my $marketwatch = WWW::Mechanize->new( autocheck => 1 );
$marketwatch->get("
http://www.marketwatch.com/tools/quotes/financials.asp?symb=ABSD&sid=0&report=2&freq=0
");

chomp(my $html = $marketwatch->content);

my $table = HTML::TableExtract->new(keep_html=>0, depth => 1, count => 1,
br_translate => 0 );
$table->parse($html);

foreach my $row ($table->rows) {
    print join("\t", @$row), "\n";
}

I am not able to figure out how to use the columns method.  My intuition
makes me think it should be something like the following (but my intuition
is wrong):

foreach my $column ($table->columns) {
    print join("\t", @$column), "\n";
}

The error message I get says:  Can't locate object method "columns" via
package "HTML::TableExtract".  The documentation doesn't shed much light
(for me anyway).  I can see in the code of the module that the columns
method belongs to HTML::TableExtract::Table, but I can't figure out how to
use it.

I appreciate any help.  For an experienced programmer, I am sure this is
trivial, but I am the closest thing to a programmer in my department, and I
don't really have anyone around me that I can get help from.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/mailman/private/houston/attachments/20070514/7227d031/attachment.html 


More information about the Houston mailing list