[plug] Filesystems for slow networks

Sun Feb 22 18:54:42 WST 2009

Heh sorry bout the delay I only get a chance to read up plug on the weekend
well as far as replication goes on the client (as i use it) you
specify each server and volume, i.e.

volume remote1
  type protocol/client
  option transport-type tcp/client
  option remote-host storage1.example.com
  option remote-subvolume brick
end-volume

volume remote2
  type protocol/client
  option transport-type tcp/client
  option remote-host storage2.example.com
  option remote-subvolume brick
end-volume

volume remote3
  type protocol/client
  option transport-type tcp/client
  option remote-host storage3.example.com
  option remote-subvolume brick
end-volume

so theres your 3 remote (remote 1 and remote 2 are say your office
servers and remote 3 is your local/home machine) servers exporting the
same sized share each (one could even be localhost if your servers
clients are the same as are mine)
then you join them with afr, as below

volume mirror0
  type cluster/afr
  favourite-child remote1
  subvolumes remote1 remote2 remote3
end-volume

So in your use case theres two options, one is all your remotes are
connected via vpn and each client has to write to all of them (bad
because individual links to server is slow but servers overall link is
fast) or setup each client as a two or more node replica with the
server being setup as favourite child (better performance, though risk
if your client edits a file someone else has edited the server version
will replace the local version)

see risks of favourite child

favorite-child (default: none)
The value of this option must be the name of a subvolume. If given,
the specified subvolume will be preferentially used in resolving
conflicts ("split-brain"). This means if a discrepancy is noticed in
the attributes or content of a file, the copy on the `favorite-child'
will be considered the definitive version and its contents will
overwrite the contents of all other copies. Use this option with
caution! It is possible to lose data with this option. If you are in
doubt, do not specify this option.

http://www.gluster.org/docs/index.php/Understanding_AFR_Translator#Translator_Options

though this only comes into play in case of split brain (i.e. one
client writes change disconnected, and someone changes server while
connected, when disconnected client reconnects then server overwrites
their copy with servers once they access that file)

so its good as a shared drive, but I wouldn't keep the only copy of
your work on it :) (unless your like us, in a situation where al your
clients connect via samba to the servers, and servers do all the
replication with their client mounts containing the local server and
the pair server)

overall afr and unify are very flexible, anything in an afr group get
copied to all the members of the group, anything in a unify group is
treated just like a pv for an lvm volume, just storage you can
allocate, so you can do fun stuff like replicate certain folders on
each client using multiple afr groups i.e. for each worker, so make
each workers own folder/volume the one their the favourite-child for
and unify all the replicated volumes that creates to create the
overall volume each worker sees, that way you get a better setup for
reliability of data, i.e. if you make a change in your dir, noone
else's changes will overwrite it and they will end up on everyones
machine once they connect.

On Fri, Feb 20, 2009 at 9:56 AM, Brad Campbell <brad at wasp.net.au> wrote:
>
> Adon Metcalfe wrote:
>>
>> I use Glusterfs ;P
>>
>> http://www.gluster.org/
>>
>> basically it has a feature called AFR (Automatic File Replication) which replicates a file to all servers in the afr group when it is created/accessed if available (uses extended attributes to keep track of this data), and because its done only when a file is accessed, if there is a period of downtime, the network load isn't saturated when it comes back up, it only syncs the files that people access that gluster figures out are out of sync. I'm using 1.3.10 from intrepids repo, though 2.0 is meant to be wayyyy better (can do all sorts of cool stuff see: http://www.gluster.org/docs/index.php/Whats_New_v2.0, specifically atomic write support).
>>
>> And its incredibly simple to configure, 1 config file for server, 1 config file for client, replication can be specified on either the client or server, uses existing filesystem with extended attributes as data store, so really easy to move data into :)
>
> This looks like it might do what I need also.
>
> There are a few of us setting up a distributed "home-office" system where we can all work from home and share resources. One of the bug requirements for this is a replicated file store at each house.
>
> The idea is along the lines of a single file-server like you would have in a normal office, but because of the speed/latency of home ADSL connections I thought it would work better if we could cluster the storage with local distributed replicas. I'd not found anything that looked remotely suitable until you pointed out Glusterfs. This *looks* like it might do what we need, but I can't find any information on using more than one distributed replica. Can you elaborate a little on your usage case for the FS ?
>
> Brad
> --
> Dolphins are so intelligent that within a few weeks they can
> train Americans to stand at the edge of the pool and throw them
> fish.
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://www.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.linux.org.au

--
Adon Metcalfe
mobile: 0429 080 931
Labyrinth Data Services
http://www.labyrinthdata.net.au