[plug] Best platform/language to setup a simple web scraper

Fred Janon fjanon at yahoo.com
Wed Mar 13 10:33:42 UTC 2013


I am not sure why you even need to run that on a server. A python (or any other language) program could be run on your local machine and display the results without the need of a server, emails... That might be a lot simpler.
Fred

--- On Wed, 3/13/13, Luke Woollard <luke.woollard at osmahi.com> wrote:

From: Luke Woollard <luke.woollard at osmahi.com>
Subject: Re: [plug] Best platform/language to setup a simple web scraper
To: "plug at plug.org.au" <plug at plug.org.au>
Date: Wednesday, March 13, 2013, 5:41 AM

if you know a little JavaScript and have used jquery at all, its
fairly easy to put something simple together in node.js with jsdom and
jquery.


example-reiwa-title-scraper.js
---
var jsdom = require('jsdom')

jsdom.env({
    html: "http://reiwa.com.au/home/default.aspx",
    scripts: ['http://code.jquery.com/jquery-1.6.min.js']
  }, function(err, window){
    var $ = window.jQuery;
    var reiwatitle = $("#wrapReiwaMenuCtrl #header a span").text()
    console.log(reiwatitle)
  }
})
---

To get node.js and npm going on ubuntu quickly

sudo apt-get install python-software-properties python g++ make
sudo add-apt-repository ppa:chris-lea/node.js
sudo apt-get update
sudo apt-get install nodejs npm
// from https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager

and then you just install jsdom with
npm install jsdom -g
// -g will install it as a global module to have it installed locally
to your scrapper just npm install jsdom from your project directory.

more info on jsdom is available at https://github.com/tmpvar/jsdom in
particular how to not to fetch resources like images, stylesheets and
scripts.

Kind Regards
Luke John


On Wed, Mar 13, 2013 at 3:16 PM, Michael Van Delft <michael at hybr.id.au> wrote:
> I’ve been using the reiwa website (and others) to look for houses. In
> particular the apartments on 120~130 Terrace Road that sometimes come
> up for < $400,000 but usually sell in a week or less. reiwa has a way
> you can save advanced searches and setup email alerts. Unfortunately
> when nothing matches your search, instead not sending an email or even
> an email that says “No matches found today” it spams you with a bunch
> of houses that have nothing to do with your search.
>
> I thought I can fix this I’ll just setup a simple web scraping script
> to do the job for me and I can have fun learning a new tool at the
> same time. So far the three options that I am looking at are Yahoo
> Pipes, Google App Engine and Scrapy/cron job on a Linode VPS I have.
>
> I’ve never used any of those before so I’m looking for advice, is
> there something else I should be looking at? Or is there any reason to
> pick one of those methods over another? How would you approach this?
>
> Regards,
> Michael
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> PLUG Membership: http://www.plug.org.au/membership
_______________________________________________
PLUG discussion list: plug at plug.org.au
http://lists.plug.org.au/mailman/listinfo/plug
Committee e-mail: committee at plug.org.au
PLUG Membership: http://www.plug.org.au/membership
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.plug.org.au/pipermail/plug/attachments/20130313/a1eea58e/attachment.html>


More information about the plug mailing list