[plug] mirroring and updating a remote http directory

Ben Jensz plug at jensz.id.au
Fri Aug 6 11:29:06 WST 2004


I'm trying to mirror a remote HTTP directory and the files contained 
within it.  Now thats alright, can do that with wget fine and dandy 
(with -r), but I want to be able to get wget at set intervals to update 
any changed contents with that remote HTTP directory.  Now the problem 
is that for some reason the "Last-modified" header is missing on the 
files, even though the directory listing page generated by the remote 
server (which is Apache) shows the last modified time-stamp.  But I 
don't have control over the remote web server, so I can't change 
anything at that end.

So what happens is that if you get wget to go and look for changes and 
only to download changed files, because it can't get the timestamp on 
the files.. it downloads every single file again.  According to the wget 
man page, wget is supposed to look at the time-stamp and/or the file 
size when figuring out whether a file has been modified since it was 
last retrieved.  But it doesn't seem to pay attention to the 
content-length info, even though it does get that.

The reason why its a problem is that the files contained in the 
directory are a couple of meg each and there are quite a lot of them, so 
I don't want to be re-mirroring possibly 100-150Mb each time, when there 
might be only 5Mb of changed files.

Anyone got any suggestions of how I could overcome this?  Alternative 
tools etc?

TIA.


/ Ben




More information about the plug mailing list