[plug] Best platform/language to setup a simple web scraper
Michael Bramwell
mbramwell at gmail.com
Wed Apr 3 07:03:59 UTC 2013
I'm a little late to this thread but if you like learning new things I
highly recommend using golang on gae. Its a rather nice language and is
made by the likes of Ken Thompson.
On 13/03/13 10:44 PM, Fred Janon wrote:
> Sounds like a simple enough project for GAE. You can develop your app
> locally and then deploy it with the GAE tools.
>
> Fred
>
> --- On *Wed, 3/13/13, Michael Van Delft /<michael at hybr.id.au>/* wrote:
>
>
> From: Michael Van Delft <michael at hybr.id.au>
> Subject: Re: [plug] Best platform/language to setup a simple web
> scraper
> To: plug at plug.org.au
> Date: Wednesday, March 13, 2013, 2:58 PM
>
> Hi Guys,
>
> Thanks for the advice, just to clarify why I'd run it on a server;
> Half the reason I'm doing this is because I'd like to get an alert
> when a new house is posted online, I could just run my saved search
> every day but that seems very manual, I'd like something that is just
> set and forget.
>
> The other half of the reason is learning I want to know a bit more
> about what Google App Engine is/dose and I find I learn best if I've
> got a hands on project. So far I think I'll go with a python script
> and maybe play with BeautifulSoup, I'd say I have a basic
> understanding of python but I'd like to be dangerous.
>
> Cheers,
> Michael
>
> On Wed, Mar 13, 2013 at 8:19 PM, Andrew Elwell
> <andrew.elwell at gmail.com </mc/compose?to=andrew.elwell at gmail.com>>
> wrote:
> >> To be honest, I'd use whatever language you are naturally
> proficient in. Web
> >> scraping is not exactly a black art, or overly difficult. I've
> done plenty
> >> of it with wget and bash.
> >
> > +1 to this (I used to use Perl before becoming more confident with
> > BeautifulSoup)
> > test on a local copy of the page 1st (curl -Lo test.html
> > http://example.com/foo.html) so you can tweak your parsing without
> > having to wait for the remote end (esp if page generation takes a
> > while)
> >
> > use an interactive session with whatever tool you use to tweak and
> > keep checking variables match what you expect
> >
> > finally double check there's not a machine readable version of the
> > info you want hidden away on the site - it;s nicer to both ends
> if you
> > can simply pull in some json without having to parse tables. - if
> > remote site has a decent (ha!) web team / they may be happy to work
> > with you. YMMV.
> >
> > Andrew
> > _______________________________________________
> > PLUG discussion list: plug at plug.org.au
> </mc/compose?to=plug at plug.org.au>
> > http://lists.plug.org.au/mailman/listinfo/plug
> > Committee e-mail: committee at plug.org.au
> </mc/compose?to=committee at plug.org.au>
> > PLUG Membership: http://www.plug.org.au/membership
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> </mc/compose?to=plug at plug.org.au>
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> </mc/compose?to=committee at plug.org.au>
> PLUG Membership: http://www.plug.org.au/membership
>
>
>
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> PLUG Membership: http://www.plug.org.au/membership
More information about the plug
mailing list