[San-Diego-pm] Photo download problem

Joel Fentin joel at fentin.com
Sun Aug 22 16:37:25 PDT 2010


Chris Grau wrote:
>> http://fentin.com/cgi-bin/temp.pl
> [snip]
> 
>> Wikimedia permits the downloading of photos and hotlinking. Yet I
>> can't get it to do anything with perl. The two URLs above (same photo
>> in each case) work in the FF browser. The third example is from my own
>> website as test. It downloads content.
> 
> I seem to have the opposite results when running your test code.  I
> receive an error attempting to download wikimedia's content (trimmed):
> 
>     $ HEAD http://upload.wikimedia.org/wikipedia/commons/1/19/PacificSilverFir_7645.jpg
>     403 Forbidden
>     X-Squid-Error: ERR_ACCESS_DENIED 0
> 
> They must be blocking LWP's user agent, because wget works fine.  I
> didn't bother testing a modified user agent string in LWP.

Are you saying that it works fine in the browser because the 
browser has a different user agent? And if so, are you saying that 
I have to spoof a user agent to get the photos?

>> $f = 'http://commons.wikimedia.org/wiki/File:PacificSilverFir_7645.jpg';
>> $Content2 = get($f);
>> print length($Content2)."<br>";
>>
>> $f = 'http://upload.wikimedia.org/wikipedia/commons/1/19/PacificSilverFir_7645.jpg';
>> $Content2 = get($f);
>> print length($Content2)."<br>";
>>
>> $f = 'http://fentin.com/Ecuador/B_Tuncarta-Children.jpg';
>> $Content2 = get($f);
>> print length($Content2)."<br>";
>> print "<BR>DONE";
> 
> What are you trying to accomplish?

My client wants to download thousands of plant pictures. He is 
being very careful about photo credits and copyright issues.

I currently have a list of about 20,000 photos he wants. Wikimedia 
is one of the sources. I wrote a Perl program to loop through the 
list. It failed at the first photo. I've been try a variety of 
ways to get that first photo.

   All three get() statements
> theoretically download the content of the URL.  In fact, running your
> script, the file on your website was the only one to successfully
> download (again, the apparent user agent problem).

The fact that you have identified this as a user agent problem is 
helpful. I'm not sure what to do with it yet, but it's much better 
than nothing.

-- 
Joel Fentin       tel: 760-749-8863
Biz Website:      http://fentin.com
Personal Website: http://fentin.com/me


More information about the San-Diego-pm mailing list