<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;">Sounds like a simple enough project for GAE. You can develop your app locally and then deploy it with the GAE tools.<div><br></div><div>Fred<br><br>--- On <b>Wed, 3/13/13, Michael Van Delft <i><michael@hybr.id.au></i></b> wrote:<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px;"><br>From: Michael Van Delft <michael@hybr.id.au><br>Subject: Re: [plug] Best platform/language to setup a simple web scraper<br>To: plug@plug.org.au<br>Date: Wednesday, March 13, 2013, 2:58 PM<br><br><div class="plainMail">Hi Guys,<br><br>Thanks for the advice, just to clarify why I'd run it on a server;<br>Half the reason I'm doing this is because I'd like to get an alert<br>when a new house is posted online, I could just run my saved search<br>every day but that seems very manual, I'd like something that is just<br>set and
forget.<br><br>The other half of the reason is learning I want to know a bit more<br>about what Google App Engine is/dose and I find I learn best if I've<br>got a hands on project. So far I think I'll go with a python script<br>and maybe play with BeautifulSoup, I'd say I have a basic<br>understanding of python but I'd like to be dangerous.<br><br>Cheers,<br>Michael<br><br>On Wed, Mar 13, 2013 at 8:19 PM, Andrew Elwell <<a ymailto="mailto:andrew.elwell@gmail.com" href="/mc/compose?to=andrew.elwell@gmail.com">andrew.elwell@gmail.com</a>> wrote:<br>>> To be honest, I'd use whatever language you are naturally proficient in. Web<br>>> scraping is not exactly a black art, or overly difficult. I've done plenty<br>>> of it with wget and bash.<br>><br>> +1 to this (I used to use Perl before becoming more confident with<br>> BeautifulSoup)<br>> test on a local copy of the page 1st (curl -Lo test.html<br>> <a
href="http://example.com/foo.html" target="_blank">http://example.com/foo.html</a>) so you can tweak your parsing without<br>> having to wait for the remote end (esp if page generation takes a<br>> while)<br>><br>> use an interactive session with whatever tool you use to tweak and<br>> keep checking variables match what you expect<br>><br>> finally double check there's not a machine readable version of the<br>> info you want hidden away on the site - it;s nicer to both ends if you<br>> can simply pull in some json without having to parse tables. - if<br>> remote site has a decent (ha!) web team / they may be happy to work<br>> with you. YMMV.<br>><br>> Andrew<br>> _______________________________________________<br>> PLUG discussion list: <a ymailto="mailto:plug@plug.org.au" href="/mc/compose?to=plug@plug.org.au">plug@plug.org.au</a><br>> <a href="http://lists.plug.org.au/mailman/listinfo/plug"
target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>> Committee e-mail: <a ymailto="mailto:committee@plug.org.au" href="/mc/compose?to=committee@plug.org.au">committee@plug.org.au</a><br>> PLUG Membership: <a href="http://www.plug.org.au/membership" target="_blank">http://www.plug.org.au/membership</a><br>_______________________________________________<br>PLUG discussion list: <a ymailto="mailto:plug@plug.org.au" href="/mc/compose?to=plug@plug.org.au">plug@plug.org.au</a><br><a href="http://lists.plug.org.au/mailman/listinfo/plug" target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>Committee e-mail: <a ymailto="mailto:committee@plug.org.au" href="/mc/compose?to=committee@plug.org.au">committee@plug.org.au</a><br>PLUG Membership: <a href="http://www.plug.org.au/membership" target="_blank">http://www.plug.org.au/membership</a><br></div></blockquote></div></td></tr></table>