[plug] A plan for spam spiders.

Craig Ringer craig at postnewspapers.com.au
Sat May 7 18:22:12 WST 2005

On Sat, 2005-05-07 at 17:50 +0800, Shayne O'Neill wrote:
> One of the things that struck me while contemplating spam and the somewhat
> clever teergrubing technique is that these ideas are never applied at the
> coalfront of spam crime. The collection of email adresses. Web spiders
> seem to generate a fantastic amount of traffic, and when combined with
> dynamic pages on a web server, quite alot of webserver load.
> I intend to fight back.

It's been done before, though the fact that many methods are non-trivial
to implement limits their popularity.

It's reasonably common to blacklist an IP that requests a banned page
mentioned in robots.txt. Another trick is to have a page, forbidden in
robots.txt, that generates a maze of virtual links that all load
veeerrrrryyy slooooowlllly and are full of fake email addresses.

> So when the spider finds a link to "DO NOT CLICK ME AS THIS PAGE WILL
> CRASH YOUR COMPUTER" which is also enticingly placed in robots.txt as
> forbidden fruit, it excitedly clicks through, recieves a gzipped html
> file, which it unpacks to view the hidden goodies, and BLAM! 1 gigabit of
> crud explodes in its head, depleting the spam servers memory, and vmem and
> causing the smoke to leak out of its vile little brain.
> The question is;- WOULD IT WORK!

I doubt it. Most spiders are likely to just cut the connection after a
certain amount downloaded, and frankly I'm not convinced all that many
of them will support gzipped pages either. I guess it might, but
probably not all that well.

On the other hand, a 1GB gzipped file /does/ come out to ~1MB (if you
gzip it again, down to 4k, but that wouldn't work for HTTP), so if it
does work it'd be pretty darn funny.

Craig Ringer

More information about the plug mailing list