[Omaha.pm] Help with parsing HTML

Ryan Stille rps at willcomminc.com
Wed Sep 28 08:41:37 PDT 2005


> What are you trying to extract?

For example, I'd like the content inside the content inside each set of
<SCRIPT></SCRIPT> tags in a given file.

Jay, I tried your suggestion of Text::Balanced, but didn't have any
luck.

Here's what I did with Text::Balanced :
__________________________
use Text::Balanced qw ( extract_tagged );

foreach $arg ( @ARGV ) {
  open (IN,$arg) or next;
  local $/;
  $filecontent = <IN>;

  ($extracted, $remainder)
     = extract_tagged($filecontent, '<SCRIPT>', '</SCRIPT>', undef,
undef);

  print "extracted: $extracted\n";
  print "remainder: $remainder\n";
  }
___________________________

But nothing was ever returned in the $extracted variable, everything was
always in the remainder.  I tried many variations of the 2nd and 3rd
arguments to extract_tagged() but nothing worked.  Is there anything
obviously wrong with how I am using it?  Once I get that to work I plan
to put it inside a while loop to continue to call extract_tagged() until
I've gone through the whole file.

-Ryan



More information about the Omaha-pm mailing list