[Chicago-talk] Testing if Page exists.

Jim Jacobus JJacobus at PonyX.com
Mon Mar 30 12:06:15 PDT 2015


At 01:00 PM 3/30/2015, you wrote:

>>LWP::Simple & LWP::Useragent returned the page, 
>>but the pages are fairly dense with a lot of 
>>embedded javascript, embedded forms and ads the 
>>are being served up. All of which I don't need. 
>>It's just taking a lot of time and memory. I 
>>was just looking for something that would just 
>>give me a 404 or 200 or stop reading at the 
>>some place like the end of the /head tag. I'm 
>>trying to test out thousands of URLs which is 
>>the real problem. (This may not be possible.)
>
>You can use the LWP::Simple head() function like 
>David said, but head() vs. get() is 
>all-or-nothing.  There’s no way to say “Give 
>me the page up to the and of the <head> tag”.

That's what I thought.


>I’m curious as to how these pages are taking a 
>lot of memory.  You’re not storing them, are 
>you?  What memory problems are you running into?

Ouch. Just fixed the memory problem. I was 
stupidly adding instead of re-using the string. That fixed that.

>What’s the problem that you’re actually 
>trying to solve?  Is it taking too long to do 
>those 100x 0 URL checks?  How long is it taking, 
>and how long would you like it to take?

The fetches were taking anywhere from 20-60 
seconds each. The remote side is taking a long 
time to fetch the parts of the page out of 
several databases before it starts sending 
results. That's a problem of poor design on their 
side that I was hoping to work around.
Thanks for your help.




>--
>Andy Lester => <http://www.petdance.com>www.petdance.com
>
>_______________________________________________
>Chicago-talk mailing list
>Chicago-talk at pm.org
>http://mail.pm.org/mailman/listinfo/chicago-talk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/chicago-talk/attachments/20150330/b631c3e5/attachment.html>


More information about the Chicago-talk mailing list