[plug] Diff-ing and patch-ing binary files

Ryan ryan at slowest.net
Mon Mar 25 16:32:01 WST 2002


The question:

I am trying to find a way to diff and patch binary files.  

I can only make 'diff' give yes/no answers to comparing binary files.
'cmp' will give my the byte offsets of differences, but nothing in a
useful form that I could later on patch a binary with .... to my
knowledge.

Does anyone know of a way to use these tools to do it, or other tools
which may help?  I though some of the backup utilities may have offered
such a feature but I can't find any that mention it yet.  I guess it
would best be described as 'differential versioning'?

Rsync supposedly sends differential information across for syncing files
(which could be done locally) - but I'm unsure if it does this for file
contents or just for the file list ... and if it can, how to get that
diff information into a separate file rather than applying it to the
destination file.

Effectively I want to be able to do this:

. repeatedly

$ diff --binary {reference file} {latest copy} > data.{x}.patch

. at some date in the future when Mr. Poo slaps Mr. Fan

$ patch {reference file} data.{x}.patch


And the reason for all this:

I have several large data files from a certain undisclosed email program
that due to the inadequacy of a few users who refuse to believe that
items deleted from the deleted items folder don't hang around until they
are needed, need backing up daily+ and about 50-100 versions stored
(I've well and truly voiced [screamed] my opinion on this to no avail).


Naturally that is going to take up a lot of space (I'm estimating
150-300+ Gig at the end of a 100 version cycle), so I want to store one
base version every so often and only keep the differences for the 50-100
versions rather than the entire file which will be mostly the same as
the base file.  

If the patch information that is stored ends up being too much more than
the size of the differences, then I'm better off storing the entire
file, so it needs to be reasonably space efficient too.  

I realise that after a large amount of fragmented changes from the base
file, there is a point where the patch info will get larger than the
changes themselves.  This is probably when a new base version would be
created.

TIA

Ryan



More information about the plug mailing list