[plug] A plan for spam spiders.

Craig Ringer craig at postnewspapers.com.au
Sat May 7 18:22:12 WST 2005


On Sat, 2005-05-07 at 17:50 +0800, Shayne O'Neill wrote:
> One of the things that struck me while contemplating spam and the somewhat
> clever teergrubing technique is that these ideas are never applied at the
> coalfront of spam crime. The collection of email adresses. Web spiders
> seem to generate a fantastic amount of traffic, and when combined with
> dynamic pages on a web server, quite alot of webserver load.
> 
> I intend to fight back.

It's been done before, though the fact that many methods are non-trivial
to implement limits their popularity.

It's reasonably common to blacklist an IP that requests a banned page
mentioned in robots.txt. Another trick is to have a page, forbidden in
robots.txt, that generates a maze of virtual links that all load
veeerrrrryyy slooooowlllly and are full of fake email addresses.

> So when the spider finds a link to "DO NOT CLICK ME AS THIS PAGE WILL
> CRASH YOUR COMPUTER" which is also enticingly placed in robots.txt as
> forbidden fruit, it excitedly clicks through, recieves a gzipped html
> file, which it unpacks to view the hidden goodies, and BLAM! 1 gigabit of
> crud explodes in its head, depleting the spam servers memory, and vmem and
> causing the smoke to leak out of its vile little brain.
> 
> The question is;- WOULD IT WORK!

I doubt it. Most spiders are likely to just cut the connection after a
certain amount downloaded, and frankly I'm not convinced all that many
of them will support gzipped pages either. I guess it might, but
probably not all that well.

On the other hand, a 1GB gzipped file /does/ come out to ~1MB (if you
gzip it again, down to 4k, but that wouldn't work for HTTP), so if it
does work it'd be pretty darn funny.

-- 
Craig Ringer




More information about the plug mailing list