[plug] mirroring and updating a remote http directory
Ben Jensz
plug at jensz.id.au
Fri Aug 6 11:29:06 WST 2004
I'm trying to mirror a remote HTTP directory and the files contained
within it. Now thats alright, can do that with wget fine and dandy
(with -r), but I want to be able to get wget at set intervals to update
any changed contents with that remote HTTP directory. Now the problem
is that for some reason the "Last-modified" header is missing on the
files, even though the directory listing page generated by the remote
server (which is Apache) shows the last modified time-stamp. But I
don't have control over the remote web server, so I can't change
anything at that end.
So what happens is that if you get wget to go and look for changes and
only to download changed files, because it can't get the timestamp on
the files.. it downloads every single file again. According to the wget
man page, wget is supposed to look at the time-stamp and/or the file
size when figuring out whether a file has been modified since it was
last retrieved. But it doesn't seem to pay attention to the
content-length info, even though it does get that.
The reason why its a problem is that the files contained in the
directory are a couple of meg each and there are quite a lot of them, so
I don't want to be re-mirroring possibly 100-150Mb each time, when there
might be only 5Mb of changed files.
Anyone got any suggestions of how I could overcome this? Alternative
tools etc?
TIA.
/ Ben
More information about the plug
mailing list