[Kc] HTTP Links

djgoku djgoku at gmail.com
Fri Aug 4 09:13:54 PDT 2006


On 8/4/06, Eric Wilhelm <scratchcomputing at gmail.com> wrote:
> # from djgoku
> # on Friday 04 August 2006 08:44 am:
>
> > What I wanted
> >to do was to parse the html for *pdf links then use File::Fetch to get
> >the PDFs, there might of been a module for this but
>
> If you want a programming exercise, go ahead and write it.  If you want
> a *learning* exercise, learn to search CPAN.  Probably the most
> important thing to learn about Perl is how to not write code.
>
>   http://search.cpan.org/search?query=html+links&mode=all
>
> >not sure how it
> >would act on http links in multiline comments, so I thought I would
> >just create something.
>
> Why are you trying to parse links within the comments?  There's only
> three of them on that page and they appear to be commented for a
> reason.  If you're trying to solve the general case of finding links
> hidden in comments, then use an XML parser to grab the comments and a
> regular expression to look inside them (you can't count on anything in
> the comments being valid HTML.)  Of course, anything automated should
> ignore the comments, but let me know what you come up with so I can add
> comments to my pages that will crash it :-)

No see, I am not wanting commented lines nor http links within
comments being single/multiline, and with my first attempt at doing
the regexp it was printing out the http links, not it isn't. I guess I
can try some modules and see how the fair with multiline comments.


More information about the kc mailing list