[plug] Best platform/language to setup a simple web scraper

Andrew Elwell andrew.elwell at gmail.com
Wed Mar 13 12:19:37 UTC 2013


> To be honest, I'd use whatever language you are naturally proficient in. Web
> scraping is not exactly a black art, or overly difficult. I've done plenty
> of it with wget and bash.

+1 to this (I used to use Perl before becoming more confident with
BeautifulSoup)
test on a local copy of the page 1st (curl -Lo test.html
http://example.com/foo.html) so you can tweak your parsing without
having to wait for the remote end (esp if page generation takes a
while)

use an interactive session with whatever tool you use to tweak and
keep checking variables match what you expect

finally double check there's not a machine readable version of the
info you want hidden away on the site - it;s nicer to both ends if you
can simply pull in some json without having to parse tables. - if
remote site has a decent (ha!) web team / they may be happy to work
with you. YMMV.

Andrew


More information about the plug mailing list