[Kc] HTTP Links
Eric Wilhelm
scratchcomputing at gmail.com
Mon Aug 7 09:01:01 PDT 2006
# from Frank Wiles
# on Monday 07 August 2006 07:43 am:
> Here is a slightly more robust version:
But how do you know it's more robust if you can't test it? Here's a
minor refactoring into a modular form. Start with:
perl -e 'my $package = require("./extract_links");
my $main = eval("\\&${package}::main");
$main->("-h");'
Then break the stuff in main() out into individual subs.
getoptions() would be a good first candidate if you parse the options
into a hash. Then your tests could do:
my ($opts, @args) = bin::extract_links::getoptions(qw(
--url http://example.com --type html
));
ok(ref($opts), 'is a hash');
ok(@args == 0);
is($opts->{url}, 'http://example.com');
# etc (or even is_deeply)
So, when you add the feature "figure out whether it is a url", you can
test that the DWIM is working without having a network connection.
You could, of course, use IPC::Run and do tests that way, but you can't
unit test if there aren't units.
--- extract_links
#!/usr/bin/perl
use warnings;
use strict;
=head1 NAME
extract_links - extract links from HTML documents
=cut
package bin::extract_links;
use HTML::SimpleLinkExtor;
use Getopt::Helpful;
sub main {
my (@args) = @_;
my $extor = HTML::SimpleLinkExtor->new();
# Some defaults
my $local_file;
my $remote_url;
my $file_type = '*';
# Parse commandline options
my $opts = Getopt::Helpful->new(
usage => "CALLER --file /path/to/file [options]\n" .
" or\n" .
" --url http://example.com [options]",
['file=s', \$local_file, '/path/file.html','read a file'],
['url=s' , \$remote_url, 'http://example.com/','fetch a url'],
['type=s', \$file_type , 'html|pdf','type of file to get'],
'+help',
);
$opts->Get_from(\@args);
# Handle a local file or a url
my @links;
if( $local_file ) {
$extor->parse_file( $local_file );
@links = $extor->links;
}
elsif ( $remote_url ) {
my $ua = LWP::UserAgent->new;
my $response = $ua->get($remote_url) or die
"Cannot retrieve URL '$remote_url': $!";
if( $response->is_success ) {
$extor->parse( $response->content );
@links = $extor->links;
}
else {
die "Unable to retrieve the URL '$remote_url': $!";
}
}
else {
$opts->usage("You must define either a -file or a -url");
}
if( $file_type ne '*' ) {
print join("\n", grep( /\.$file_type/i, @links ) );
}
else {
print join("\n", @links);
}
}
package main;
if($0 eq __FILE__) {
bin::extract_links::main(@ARGV);
}
# vi:ts=2:sw=2:et:sta
my $package = 'bin::extract_links';
# EOF
--Eric
--
"Because understanding simplicity is complicated."
--Eric Raymond
---------------------------------------------------
http://scratchcomputing.com
---------------------------------------------------
More information about the kc
mailing list