[Kc] HTTP Links

djgoku djgoku at gmail.com
Sun Aug 6 11:14:44 PDT 2006


On 8/4/06, Eric Wilhelm <scratchcomputing at gmail.com> wrote:
> # from djgoku
> # on Friday 04 August 2006 09:13 am:
>
> >see how the fair with multiline comments.
>
> me thinks they fare well
>
>   $ cat link_extract
>   #!/usr/bin/perl
>   use warnings; use strict; use HTML::SimpleLinkExtor;
>   my $extor = HTML::SimpleLinkExtor->new();
>   {local $/; $extor->parse(<STDIN>);}
>   print join("\n", $extor->links, '');
>
>   $ curl -s http://www.... | ./link_extract | grep '\.pdf'
>   http://.../3145_Intro.pdf
>   http://.../3145_Chap01.pdf
>   ...

#!/usr/bin/perl
#
# My second try, for get_http.pl
# Syntax: get_http.pl filename (pdf|html|tar|etc)
# Todo: Add use for web links (http://blah.com/blah

use strict;
use warnings;

use HTML::SimpleLinkExtor;

my $extor = HTML::SimpleLinkExtor->new();

# Filename Stuff
my $source = shift @ARGV;
$extor->parse_file($source);
my @links = $extor->links;

# Filetype Stuff
my $filetype = '*';
$filetype = shift @ARGV if (@ARGV);

# Print only found $filetype
foreach (@links) {
	print "$_\n" if (m{\s*\.$filetype}i);
}


More information about the kc mailing list