[pm-h] Strip Links from FTP

Mike Flannigan mikeflan at att.net
Thu Apr 2 19:19:30 PDT 2009


Anybody know how to strip links from an FTP site?
The script below works good on HTTP sites, but
has rarely or never worked for me on FTP sites. 


#
#
# This script strips all links from the $html website.
#
#
use strict;
use warnings;

use LWP::Simple;
use HTML::TreeBuilder;


#my $html = get 'http://www.census.gov/geo/www/cob/co2000.html';
my $html = get 'ftp://mcmcftp.er.usgs.gov/Katrina/508dpi/';

open OUT, ">", 'links.txt' or die "$0: open links.txt: $!";

my $tree = HTML::TreeBuilder->new_from_content($html);
my $links = $tree->extract_links;

foreach (@$links) {
   my ($link, $elem, $attr, $tag) = @$_;
   print OUT qq(<$tag $attr="$link">\n);
}

close OUT;

__END__



More information about the Houston mailing list