[Kc] HTTP Links

Eric Wilhelm scratchcomputing at gmail.com
Fri Aug 4 12:52:59 PDT 2006


# from djgoku
# on Friday 04 August 2006 09:13 am:

>see how the fair with multiline comments.

me thinks they fare well

  $ cat link_extract
  #!/usr/bin/perl
  use warnings; use strict; use HTML::SimpleLinkExtor;
  my $extor = HTML::SimpleLinkExtor->new();
  {local $/; $extor->parse(<STDIN>);}
  print join("\n", $extor->links, '');


  $ curl -s http://www.... | ./link_extract | grep '\.pdf'
  http://.../3145_Intro.pdf
  http://.../3145_Chap01.pdf
  ...

Read the source and you'll see that it subclasses all the way down to 
HTML::Parser, which seems pretty quick, correct, and time-tested.  If 
you look at the credits, the caliber of the authors listed might also 
imply that this stuff all works pretty well.

--Eric
-- 
"...the bourgeoisie were hated from both ends: by the proles, because
they had all the money, and by the intelligentsia, because of their
tendency to spend it on lawn ornaments."
--Neal Stephenson
---------------------------------------------------
    http://scratchcomputing.com
---------------------------------------------------


More information about the kc mailing list