Updated Filer Active Active cross cluster continuous synchronization (markdown)

Jérôme BAROTIN 2021-07-22 11:51:17 +02:00
parent ec3526336d
commit 45b4431493

@ -22,7 +22,16 @@ weed filer.sync -a <filer1_host>:<filer1_port> -b <filer2_host>:<filer2_port> -i
At the beginning, it will bootstrap from the beginning of time, or resume from the last replication checkpoint. Later, it will just run continuously and persist checkpoints periodically.
## More clusters?
## How it works?
Each filer has its own local change logs. `weed filer.sync` will read the logs and replay them in the other cluster.
`weed filer.sync` remembers each filer's "signature" and replication checkpoints. So you can stop `weed filer.sync` and start it later safely.
Also, the "signature" will ensure same change will only be applied once in one filer. Active-Active synchronization would not cause multiple ping-pong changes for one file update.
## More clusters ?
If there are 3 or more clusters, you can choose fully connected setup or chained setup, or any more complicated topology.
@ -64,16 +73,16 @@ cluster1 <-- filer.sync --> cluster2 -- filer.sync --> cluster3 -- filer.sync -
+----- filer.sync --> cluster5
```
# How it works?
## Filer Proxy
Each filer has its own local change logs. `weed filer.sync` will read the logs and replay them in the other cluster.
By default, filer.sync will upload files directly on the cluster master & volume with the IP configured in the filer. These IPs could not be accessible to the filer.sync cause of network configuration (for example cluster1 and cluster2 are not hosted on different hosting provider). In this case, it could be useful to use the filerProxy option to make filer.sync does all the transfers through the filer. In order to enable this option `-a.filerProxy` or/and `-b.filerProxy ` can be added to the `weed filer.sync` cli.
`weed filer.sync` remembers each filer's "signature" and replication checkpoints. So you can stop `weed filer.sync` and start it later safely.
## Debug log
To see all detail of transfers executed by filer.sync, options `-a.debug` or/and `-b.debug` can be added to the `weed filer.sync` cli.
Also, the "signature" will ensure same change will only be applied once in one filer. Active-Active synchronization would not cause multiple ping-pong changes for one file update.
# Limitations
## Limitations
This should be fairly scalable. However, it is limited by network bandwidth and latency. So even though changes are received within milliseconds and replayed right away, there would be data discrepancies if a file is changed quickly in two distant data centers.
For large clusters, the rate of change may be so high that the replication can not catch up. You may want to only synchronize a specific folder to reduce the work load.