Synchronous Replication
Although the new asynchronous replication facility that ships with 9.0 ain’t
released to the wide public yet, our hackers hero are already working on the
synchronous version of it. A part of the facility is rather easy to design,
we want something comparable to
DRBD flexibility, but specific to our
database world. So
synchronous would either mean
recv,
fsync or
apply,
depending on what you need the
standby to have already done when the master
acknowledges the
COMMIT
. Let’s call that the
service level.
The part of the design that’s not so easy is more interesting. Do we need to
register standbys and have the
service level setup per standby? Can we get
some more flexibility and have the
service level set on a per-transaction
basis? The idea here would be that the application knows which transactions
are meant to be extra-safe and which are not, the same way that you can set
synchronous_commit to off
when dealing with web sessions, for example.
Why choosing? I hear you ask. Well, it’s all about having more data safety, and a typical setup would contain an asynchronous reporting server and a local failover synchronous server. Then add a remote one, too. So even if we pick the transaction based facility, we still want to be able to choose at setup time which server to failover to. Than means we don’t want that much flexibility now, we want to know where the data is safe, we don’t want to have to guess.
Some way to solve that is to be able to setup a slave as being the failover
one, or say, the
sync
one. Now, the detail that ruins it all is that we need
a
timeout to handle worst cases when a given slave loses its connectivity
(or power, say). Now, the slave ain’t in
sync any more and some people will
require that the service is still available (
timeout but
COMMIT
) and some
will require that the service is down: don’t accept a new transaction if you
can’t make its data safe to the slave too.
The answer would be to have the master arbitrate between what the
transaction wants and what the slave is setup to provide, and what it’s able
to provide at the time of the transaction. Given a transaction with a
service level of
apply and a slave setup for being
async, the
COMMIT
does
not have to wait, because there’s no known slave able to offer the needed
level. Or the
COMMIT
can not happen, for the very same reason.
Then I think it all flows quite naturally from there, and while arbitrating the master could record which slave is currently offering what service level. And offering the information in a system view too, of course.
The big question that’s not answered in this proposal is how to setup that being unable to reach the wanted service level is an error or a warning?
That too would need to be for the master to arbitrate based on a per standby
and a per transaction setting, and in the general case it could be a
quorum
setup: each slave is given a
weight and each transaction a
quorum to
reach. The master sums up the weights of the standby that ack the
transaction at the needed
service level and the
COMMIT
happens as soon as
the quorum is reached, or is canceled as soon as the
timeout is reached,
whichever comes first.
Such a model allows for very flexible setups, where each standby has a weight and offers a given service level, and each transaction waits until a quorum is reached. Giving the right weights to your standbys (like, powers of two) allow you to set the quorum in a way that only one given standby is able to acknowledge the most important transactions. But that’s flexible enough you can change it at any time, it’s just a weight that allows a sum to be made, so my guess would be it ends up in the feedback loop between the standby and its master.
The most appealing part of this proposal is that it doesn’t look complex to implement, and should allow for highly flexible setups. Of course, the devil is in the details, and we’re talking about latencies in the distributed system here. That’s also being discussed on the mailing list.