[pm-h] Houston Digest, Vol 30, Issue 6

Tue May 15 12:04:50 PDT 2007

>
> > I am an intermediate perl user.  I taught myself Perl by reading
> "Learning
> > Perl," with some online tutorials and I have some other reference
> texts.  I
> > can generally do what I need to with with Perl, but my code is far from
> > elegant.  I understand the very basics of object-oriented programming in
> > Perl, but I generally need sample code to get started with modules from
> > cpan.  I am a professor at Rice University and have found Perl to be
> > invaluable for extracting data for my research, especially the regular
> > expression capabilities of Perl.  I have been unable to attend any of
> the
> > monthly meetings, but hope to in the future.
> >
> >  For my current project, I am trying to extract historical financial
> > statement data from www.marketwatch.com.  The url is
> >
> http://www.marketwatch.com/tools/quotes/financials.asp?symb=ABSD&sid=0&report=2&freq=0
> .
> > I use WWW::Mechanize to download the webpage and then I use
> > HTML::TableExtract to extract the text that I need.  I want to transpose
> the
> > table at depth=1, count=1 after extracting it so that each year is a row
> and
> > each variable is a column.  I have not been able to find any
> documentation
> > on how to extract a column from a table using HTML::TableExtract.
> >
> >  The following simple program downloads the data using WWW::Mechanize
> and
> > extracts the table with HTML::TableExtract and prints the output of each
> > row.
> >
> >  #!/usr/bin/perl
> >
> >  use HTML::TableExtract;
> >  use WWW::Mechanize;
> >  use strict;
> >
> >  my $marketwatch = WWW::Mechanize->new( autocheck => 1 );
> > $marketwatch->get("
> http://www.marketwatch.com/tools/quotes/financials.asp?symb=ABSD&sid=0&report=2&freq=0
> > ");
> >
> >  chomp(my $html = $marketwatch->content);
> >
> >  my $table = HTML::TableExtract->new(keep_html=>0, depth =>
> > 1, count => 1, br_translate => 0 );
> >  $table->parse($html);
> >
> >  foreach my $row ($table->rows) {
> >      print join("\t", @$row), "\n";
> >  }
> >
> >  I am not able to figure out how to use the columns method.  My
> intuition
> > makes me think it should be something like the following (but my
> intuition
> > is wrong):
> >
> >  foreach my $column ($table->columns) {
> >      print join("\t", @$column), "\n";
> >  }
> >
> >  The error message I get says:  Can't locate object method "columns" via
> > package "HTML::TableExtract".  The documentation doesn't shed much light
> > (for me anyway).  I can see in the code of the module that the columns
> > method belongs to HTML::TableExtract::Table, but I can't figure out how
> to
> > use it.
> >
> >  I appreciate any help.  For an experienced programmer, I am sure this
> is
> > trivial, but I am the closest thing to a programmer in my department,
> and I
> > don't really have anyone around me that I can get help from.
> >
> > _______________________________________________
> > Houston mailing list
> > Houston at pm.org
> > http://mail.pm.org/mailman/listinfo/houston
> > Website: http://houston.pm.org/
> >
>
> It looks like you need to call method columns from a
> HTML::TableExtract::Table object and not a HTML::TableExtract object.
>
> >From the docs and your email maybe something like this could get you
> started:
>
> my $table = HTML::TableExtract->new(keep_html=>0, depth => 1, count =>
> 1, br_translate => 0 );
> $table->parse($html);
>
> my $t = $table->table(1,1);
>
> foreach my $row ($t->columns) {
>     print join("\t", @$row), "\n";
> }


Thanks.  This works perfectly and saved me hours!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/mailman/private/houston/attachments/20070515/ace4ea5e/attachment.html