[Kc] HTTP Links
Eric Wilhelm
scratchcomputing at gmail.com
Fri Aug 4 12:52:59 PDT 2006
# from djgoku
# on Friday 04 August 2006 09:13 am:
>see how the fair with multiline comments.
me thinks they fare well
$ cat link_extract
#!/usr/bin/perl
use warnings; use strict; use HTML::SimpleLinkExtor;
my $extor = HTML::SimpleLinkExtor->new();
{local $/; $extor->parse(<STDIN>);}
print join("\n", $extor->links, '');
$ curl -s http://www.... | ./link_extract | grep '\.pdf'
http://.../3145_Intro.pdf
http://.../3145_Chap01.pdf
...
Read the source and you'll see that it subclasses all the way down to
HTML::Parser, which seems pretty quick, correct, and time-tested. If
you look at the credits, the caliber of the authors listed might also
imply that this stuff all works pretty well.
--Eric
--
"...the bourgeoisie were hated from both ends: by the proles, because
they had all the money, and by the intelligentsia, because of their
tendency to spend it on lawn ornaments."
--Neal Stephenson
---------------------------------------------------
http://scratchcomputing.com
---------------------------------------------------
More information about the kc
mailing list