[Kc] HTTP Links

djgoku djgoku at gmail.com
Fri Aug 4 08:44:16 PDT 2006


I was browsing around (perl.org) and found
(http://www.perl.org/books/beginning-perl/) which has a bunch of html
links to PDFs to download for the book Beginning Perl. What I wanted
to do was to parse the html for *pdf links then use File::Fetch to get
the PDFs, there might of been a module for this but not sure how it
would act on http links in multiline comments, so I thought I would
just create something.

Well first night was up til 2:30AM trying to think of regexp to parse
the PDF links, which didn't go to well. I also noticed that some of
the html links were commented out which I didn't want in my results,
just LIVE links. So I had to work out a way to parse for single/multi
line comments and discard accordingly, but to make sure that there
were no links prematch of (<!--) and postmatch (-->). So after two
nights I think I have all the bases covered. Though currently my
program prints the html with the results. I haven't parsed out the
http links yet, that is next, then fetching.

http://djgoku.dyndns.org/dj_goku/get_http.pl


More information about the kc mailing list