[Wellington-pm] Thank you everyone!

Grant McLean grant at mclean.net.nz
Tue Feb 9 13:50:10 PST 2016


On Wed, 2016-02-10 at 10:20 +1300, Donovan Jones wrote:
> On the subject of "just use grep" for html parsing. I am not entirely
> serious, I have done plenty of not using grep with perl and python
> using xpath or css selectors. My point it more that when you are
> scraping you are at the mercy of whoever produced the html you are
> interested in. This means that A, 90% of the time the semantic markup
> is shit so you can never target the actual thing you are after

One related use case where things aren't quite so bad is when you're
writing regression tests for your own system.  Ideally in that case you
should be able to add the necessary classes/IDs to make writing tests
easy and the result should not be too brittle.  In this case, I'd
definitely recommend CSS selectors / XPath.

Cheers
Grant



More information about the Wellington-pm mailing list