[plug] Managing files across diverse media?

John McCabe-Dansted gmatht at gmail.com
Mon Sep 1 12:10:05 UTC 2014


I have many different disks (primarily sitting on online servers or
offline removable harddisks) storing various files. I want to know
that they are all backed up in some form. Given that I have terabytes
of files somehow (backups of backups apparently) I don't want to just
backup everything onto new media yet again. I'd like to be able to
quickly maintain a list of sha256 or md5 sums that could be used to:

1) To list all files on X that are not duplicated/backed up on other media
2) Deduplicate files on X quickly (using existing md5 hashes).
3) To list all files that are not duplicated onto offline or WORM storage
4) To list all files that are not duplicated onto offsite storage
5) Match JPGs by EXIF date.

It seems to me that I wouldn't be the only person in this boat.
However, there doesn't seem to be a tool even just to quickly update a
list of file hashes. For example, md5deep wants to regenerate hashes
for unmodified files on every run.

I am looking at writing a tool to record and manage file IDs across
media [1], but doing this right could take quite a while.

How do other people handle this?


[1] https://github.com/gmatht/joshell/tree/master/mass_file_management

-- 
John C. McCabe-Dansted


More information about the plug mailing list