[tpm] Scrape covid-19.ontario.ca?
zoffix at zoffix.com
zoffix at zoffix.com
Mon Oct 12 15:52:23 PDT 2020
Hi,
> Has anyone tried to scrape
Well, it's against terms of use. So, hopefully no one :)
https://covid19results.ehealthontario.ca:4443/terms
It has a lot of JavaScript (see
https://metacpan.org/pod/distribution/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod#JavaScript
)
You can pop open browser console (F12, usually), then go into network
tab and see what kind of requests occur when you submit the form. Then
you'd try to replicate that with Mech.
I see a ReCaptcha thing at the bottom, so I'd assume there's more
protection on that site.
Cheers,
ZZ
Quoting Felipe Gasper <felipe at felipegasper.com>:
> Hi all,
>
> Has anyone tried to scrape
> https://covid19results.ehealthontario.ca:4443/agree?
>
> It’s not working for me w/ WWW::Mechanize, which I’m guessing is
> because the server does some sort of verification that the various
> line-noise-ish Google JS URLs embedded in the page actually get
> loaded.
>
> Thoughts, anyone? Thanks!
>
> cheers,
> -Felipe Gasper
> Mississauga
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> https://mail.pm.org/mailman/listinfo/toronto-pm
More information about the toronto-pm
mailing list