[Chicago-talk] Testing if Page exists.
Jim Jacobus
JJacobus at PonyX.com
Mon Mar 30 12:06:15 PDT 2015
At 01:00 PM 3/30/2015, you wrote:
>>LWP::Simple & LWP::Useragent returned the page,
>>but the pages are fairly dense with a lot of
>>embedded javascript, embedded forms and ads the
>>are being served up. All of which I don't need.
>>It's just taking a lot of time and memory. I
>>was just looking for something that would just
>>give me a 404 or 200 or stop reading at the
>>some place like the end of the /head tag. I'm
>>trying to test out thousands of URLs which is
>>the real problem. (This may not be possible.)
>
>You can use the LWP::Simple head() function like
>David said, but head() vs. get() is
>all-or-nothing. Thereâs no way to say âGive
>me the page up to the and of the <head> tagâ.
That's what I thought.
>Iâm curious as to how these pages are taking a
>lot of memory. Youâre not storing them, are
>you? What memory problems are you running into?
Ouch. Just fixed the memory problem. I was
stupidly adding instead of re-using the string. That fixed that.
>Whatâs the problem that youâre actually
>trying to solve? Is it taking too long to do
>those 100x 0 URL checks? How long is it taking,
>and how long would you like it to take?
The fetches were taking anywhere from 20-60
seconds each. The remote side is taking a long
time to fetch the parts of the page out of
several databases before it starts sending
results. That's a problem of poor design on their
side that I was hoping to work around.
Thanks for your help.
>--
>Andy Lester => <http://www.petdance.com>www.petdance.com
>
>_______________________________________________
>Chicago-talk mailing list
>Chicago-talk at pm.org
>http://mail.pm.org/mailman/listinfo/chicago-talk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/chicago-talk/attachments/20150330/b631c3e5/attachment.html>
More information about the Chicago-talk
mailing list