[Kc] HTTP Links
djgoku
djgoku at gmail.com
Sun Aug 6 11:14:44 PDT 2006
On 8/4/06, Eric Wilhelm <scratchcomputing at gmail.com> wrote:
> # from djgoku
> # on Friday 04 August 2006 09:13 am:
>
> >see how the fair with multiline comments.
>
> me thinks they fare well
>
> $ cat link_extract
> #!/usr/bin/perl
> use warnings; use strict; use HTML::SimpleLinkExtor;
> my $extor = HTML::SimpleLinkExtor->new();
> {local $/; $extor->parse(<STDIN>);}
> print join("\n", $extor->links, '');
>
> $ curl -s http://www.... | ./link_extract | grep '\.pdf'
> http://.../3145_Intro.pdf
> http://.../3145_Chap01.pdf
> ...
#!/usr/bin/perl
#
# My second try, for get_http.pl
# Syntax: get_http.pl filename (pdf|html|tar|etc)
# Todo: Add use for web links (http://blah.com/blah
use strict;
use warnings;
use HTML::SimpleLinkExtor;
my $extor = HTML::SimpleLinkExtor->new();
# Filename Stuff
my $source = shift @ARGV;
$extor->parse_file($source);
my @links = $extor->links;
# Filetype Stuff
my $filetype = '*';
$filetype = shift @ARGV if (@ARGV);
# Print only found $filetype
foreach (@links) {
print "$_\n" if (m{\s*\.$filetype}i);
}
More information about the kc
mailing list