<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;">I am not sure why you even need to run that on a server. A python (or any other language) program could be run on your local machine and display the results without the need of a server, emails... That might be a lot simpler.<div><br></div><div>Fred<br><br>--- On <b>Wed, 3/13/13, Luke Woollard <i><luke.woollard@osmahi.com></i></b> wrote:<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px;"><br>From: Luke Woollard <luke.woollard@osmahi.com><br>Subject: Re: [plug] Best platform/language to setup a simple web scraper<br>To: "plug@plug.org.au" <plug@plug.org.au><br>Date: Wednesday, March 13, 2013, 5:41 AM<br><br><div class="plainMail">if you know a little JavaScript and have used jquery at all, its<br>fairly easy to put something simple together in node.js with jsdom
and<br>jquery.<br><br><br>example-reiwa-title-scraper.js<br>---<br>var jsdom = require('jsdom')<br><br>jsdom.env({<br> html: "<a href="http://reiwa.com.au/home/default.aspx" target="_blank">http://reiwa.com.au/home/default.aspx</a>",<br> scripts: ['<a href="http://code.jquery.com/jquery-1.6.min.js'" target="_blank">http://code.jquery.com/jquery-1.6.min.js'</a>]<br> }, function(err, window){<br> var $ = window.jQuery;<br> var reiwatitle = $("#wrapReiwaMenuCtrl #header a span").text()<br> console.log(reiwatitle)<br> }<br>})<br>---<br><br>To get node.js and npm going on ubuntu quickly<br><br>sudo apt-get install python-software-properties python g++ make<br>sudo add-apt-repository ppa:chris-lea/node.js<br>sudo apt-get update<br>sudo apt-get install nodejs npm<br>// from <a href="https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager"
target="_blank">https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager</a><br><br>and then you just install jsdom with<br>npm install jsdom -g<br>// -g will install it as a global module to have it installed locally<br>to your scrapper just npm install jsdom from your project directory.<br><br>more info on jsdom is available at <a href="https://github.com/tmpvar/jsdom" target="_blank">https://github.com/tmpvar/jsdom</a> in<br>particular how to not to fetch resources like images, stylesheets and<br>scripts.<br><br>Kind Regards<br>Luke John<br><br><br>On Wed, Mar 13, 2013 at 3:16 PM, Michael Van Delft <<a ymailto="mailto:michael@hybr.id.au" href="/mc/compose?to=michael@hybr.id.au">michael@hybr.id.au</a>> wrote:<br>> I’ve been using the reiwa website (and others) to look for houses. In<br>> particular the apartments on 120~130 Terrace Road that sometimes come<br>> up for < $400,000 but usually sell in a week or less.
reiwa has a way<br>> you can save advanced searches and setup email alerts. Unfortunately<br>> when nothing matches your search, instead not sending an email or even<br>> an email that says “No matches found today” it spams you with a bunch<br>> of houses that have nothing to do with your search.<br>><br>> I thought I can fix this I’ll just setup a simple web scraping script<br>> to do the job for me and I can have fun learning a new tool at the<br>> same time. So far the three options that I am looking at are Yahoo<br>> Pipes, Google App Engine and Scrapy/cron job on a Linode VPS I have.<br>><br>> I’ve never used any of those before so I’m looking for advice, is<br>> there something else I should be looking at? Or is there any reason to<br>> pick one of those methods over another? How would you approach this?<br>><br>> Regards,<br>> Michael<br>>
_______________________________________________<br>> PLUG discussion list: <a ymailto="mailto:plug@plug.org.au" href="/mc/compose?to=plug@plug.org.au">plug@plug.org.au</a><br>> <a href="http://lists.plug.org.au/mailman/listinfo/plug" target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>> Committee e-mail: <a ymailto="mailto:committee@plug.org.au" href="/mc/compose?to=committee@plug.org.au">committee@plug.org.au</a><br>> PLUG Membership: <a href="http://www.plug.org.au/membership" target="_blank">http://www.plug.org.au/membership</a><br>_______________________________________________<br>PLUG discussion list: <a ymailto="mailto:plug@plug.org.au" href="/mc/compose?to=plug@plug.org.au">plug@plug.org.au</a><br><a href="http://lists.plug.org.au/mailman/listinfo/plug" target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>Committee e-mail: <a ymailto="mailto:committee@plug.org.au"
href="/mc/compose?to=committee@plug.org.au">committee@plug.org.au</a><br>PLUG Membership: <a href="http://www.plug.org.au/membership" target="_blank">http://www.plug.org.au/membership</a><br></div></blockquote></div></td></tr></table>