[tpm] Scrape covid-19.ontario.ca?

zoffix at zoffix.com zoffix at zoffix.com
Mon Oct 12 15:52:23 PDT 2020


Hi,

> 	Has anyone tried to scrape

Well, it's against terms of use. So, hopefully no one :)  
https://covid19results.ehealthontario.ca:4443/terms

It has a lot of JavaScript (see  
https://metacpan.org/pod/distribution/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod#JavaScript  
)

You can pop open browser console (F12, usually), then go into network  
tab and see what kind of requests occur when you submit the form. Then  
you'd try to replicate that with Mech.

I see a ReCaptcha thing at the bottom, so I'd assume there's more  
protection on that site.


Cheers,
ZZ



Quoting Felipe Gasper <felipe at felipegasper.com>:

> Hi all,
>
> 	Has anyone tried to scrape  
> https://covid19results.ehealthontario.ca:4443/agree?
>
> 	It’s not working for me w/ WWW::Mechanize, which I’m guessing is  
> because the server does some sort of verification that the various  
> line-noise-ish Google JS URLs embedded in the page actually get  
> loaded.
>
> 	Thoughts, anyone? Thanks!
>
> cheers,
> -Felipe Gasper
> Mississauga
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> https://mail.pm.org/mailman/listinfo/toronto-pm


More information about the toronto-pm mailing list