[Kc] HTTP Links

Eric Wilhelm scratchcomputing at gmail.com
Fri Aug 4 09:03:23 PDT 2006


# from djgoku
# on Friday 04 August 2006 08:44 am:

> What I wanted
>to do was to parse the html for *pdf links then use File::Fetch to get
>the PDFs, there might of been a module for this but 

If you want a programming exercise, go ahead and write it.  If you want 
a *learning* exercise, learn to search CPAN.  Probably the most 
important thing to learn about Perl is how to not write code.

  http://search.cpan.org/search?query=html+links&mode=all

>not sure how it 
>would act on http links in multiline comments, so I thought I would
>just create something.

Why are you trying to parse links within the comments?  There's only 
three of them on that page and they appear to be commented for a 
reason.  If you're trying to solve the general case of finding links 
hidden in comments, then use an XML parser to grab the comments and a 
regular expression to look inside them (you can't count on anything in 
the comments being valid HTML.)  Of course, anything automated should 
ignore the comments, but let me know what you come up with so I can add 
comments to my pages that will crash it :-)

--Eric
-- 
Turns out the optimal technique is to put it in reverse and gun it.
--Steven Squyres (on challenges in interplanetary robot navigation)
---------------------------------------------------
    http://scratchcomputing.com
---------------------------------------------------


More information about the kc mailing list