[plug] A plan for spam spiders.

Shayne O'Neill shayne at guild.murdoch.edu.au
Sun May 8 21:33:09 WST 2005


Yeah, theres a protocol here,

Accept-encoding: gzip is where the client tells the server that it'll cope
with gzip.

Heres a dump from a session on the slashdot server


Connected to slashdot.org.
Escape character is '^]'.
******** I SEND *********
GET / HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/
    jpeg, image/pjpeg, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)
Host: slashdot.org <--No idea what I'm supposed to send here!
Connection: Keep-Alive

******** I RECIEVE *******
HTTP/1.1 200 OK
Date: Sun, 08 May 2005 13:27:39 GMT
Server: Apache/1.3.33 (Unix) mod_gzip/1.3.26.1a mod_perl/1.29
SLASH_LOG_DATA: shtml
X-Powered-By: Slash 2.005000
X-Bender: Shooting DNA at each other to make babies. I find it offensive!
Cache-Control: private
Pragma: private
Vary: User-Agent,Accept-Encoding
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
Content-Encoding: gzip

Bunch of gzip jibbajabba starts here..... (NO MIME Headers for the gzip
btw)

I'm figuring the trick would be to custom write a server that just ignores
the Accept: bit all together. (Or perhaps coerce apache to ignore it
somehow.)

Heres a bit of reckoning I have. If people started doing this, spam
spiders would only be able to respond by not accepting gzip streams. Then
folks could quite easily protect themselves by forcing gzip and thus
locking out those servers who won't accept it.

Does anyone know how to get apache to log whether the Accept-encoding:
field specifies compression of any sort? I'd like to get some stats on how
oft this is used, both in browsers AND in spiders.

> >That's a very good point. I wouldn't particularly want to have to
> >regularly compress 1GB streams on the fly, though it probably wouldn't
> >actually be too bad.
>
> De-compression isn't too bad. I suspect that the Apache module is
> designed so that clients that can't handle the gzipped will have it
> de-compressed for them on the server. It makes more sense to store
> gzipped on the server.
>
> >I have absolutely no idea how to send a pre-compressed file.
>
> Through HTML :-) The client (if it says can handle it) should
> decompress as it receives the data stream.
> --
> /"\ Bernd Felsche - Innovative Reckoning, Perth, Western Australia
> \ /  ASCII ribbon campaign | I'm a .signature virus!
>  X   against HTML mail     | Copy me into your ~/.signature
> / \  and postings          | to help me spread!
>
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://www.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.linux.org.au
>



More information about the plug mailing list