PostgreSQL master slave(s) asynchronous replication

The easy way. Really.

The SkyTools project is a suite of tools allowing to easily (I mean, in a really easy way) implement a replication solution, and I'm keeping a SkyTools page here.

You'll have to run two daemons for the replication to be operational, first a PGQ maintenance one (pgqadm.py), which role is to manage the provider queue. Then the replication daemon itself, londiste.py, which is a PGQ consumer implementation.

That means londiste.py is using PGQ as a transport and queuing mecanism, and "only" cares about replaying modification on the subscriber(s).

Given the current state of affairs, I'm thinking that a kind of FAQ could be welcomed.

Table of contents

Basic vocabulary

SkyTools uses those terms, so maybe you want to hear about them first:

provider
server at the origin of data, often called master.
subscriber
server receiving the data changes and applying them, often called slave.
lag
time delta between latter produced event on the provider and latter consumed event (or batch) by a subscriber (or consumer), measure per subscriber (actually, per consumer).
consumer
generic term for a PGQ daemon fetching data from the provider, londiste.py is an example of a PGQ consumer.
queue
events to get consumed. In londiste terms, a queue can be seen as a set of tables for which subscription is offered to subscribers.

Frequently Asked Questions

When is it a good idea to use londiste?

Let's try to have a simple easy answer, as that's the spirit here:

Then londiste.py is for you. Really, it is. It's even shining as a solution.

Some other projects are claiming to be solutions to, but from what I've been hearing (and from people in the know), they're as much problems as solutions.

What does londiste lack that slony has?

Short answer: cascading support, execute script, failover/switchover.

Some more details, maybe...

cascading support

If you consider cascading is supported when your data flows from master A to slave B and from there to slave C, then in case of B crashing, C still receives its data. This is not yet supported by londiste.

DDL management

Londiste does none, but that does not mean you can't do it. It's even not tricky at all in the simple cases. Say you have a master named M which is providing data for a consumer named S, and you want to add the table T to the replication:

 $ londiste.py <ini> provider add schema.T
 $ londiste.py <ini> subscriber add schema.T

Ok that's supported and easy. Now you want to add a column C to this table:

 $ psql S
 S# ALTER TABLE schema.T ADD COLUMN C ...;

 $ psql M
 M# ALTER TABLE schema.T ADD COLUMN C ...;
 M# SELECT pgq.provider_refresh_triggers('queue_name', 'schema.T');

(Remember, M is master and S slave, or in londiste terms, provider and subscriber).

failover / switchover

The londiste manual page talks about it, and what it has to say it's simply that it's not part of the design and use case of londiste. If you really need it, please use another software.

Now, of course you can get it to work, but londiste won't help you not mess up your lagging data, so don't rely too much on it.

Let's talk about this master

Londiste is a single master to several slaves solutions. That means any given table has to see its data flow from its master to its slaves.

Now it's possible to have, in the same PostgreSQL database, some provider tables, from which the data are flowing, and some subscriber tables, to which some other data are flowing.

Single master only means than any one given table on a given host can not be considered at the same time as providing data and as subscribing to remote data. That is multi-master replication, and another topic entirely.

Multi-master, single slave (federated database)

Yes, you can.

Ok, what we are talking about here is having a central database containing a live copy of the data provided by more than one masters.

This is quite easy to do when you realize that PostgreSQL and londiste both refer to tables by their schema-qualified name internally. Ok, that's not exactly that in the case of PostgreSQL, but it's akin to that.

The solution is then to create a globaly unique local schema on each provider, and to create all those schemas on the central subscriber. You then independantly replicate provider1.table and provider2.table to tables of the same schema-qualified name on the central server.

And the cherry on the cake is to create a federation.table table on the central server, and to either use inheritance or a subscriber trigger to move data to the main table.

When using a subscriber trigger, mind the londiste.py handling of triggers and the following command:

  $ londiste.py <ini> subscriber restore-triggers <schema.table>

Asynchronous-what?

That async thing means you don't have any guarantee at commit time on the provider (the master) that the changeset you made is replayed already on the slave.

The drawback is that cases exist where the provider has happily commited a change that the slave finds itself unable to apply (no space left on device is the first example coming to my mind).

The advantage is that in case of network failure, disk shortage, or some other kinf of outage, the provider see very little impact: its queues are growing more than usual, that's it.

Depending on your needs, you'll be more interrested into the advantage or the drawback.

How do this ticker thing relates to londiste?

The ticker is an independant daemon facility which is responsible for generating ticks. Those ticks are produced on-demand, each time a consumer is asking for a new batch, and will get produced such as any batch contains either ticker_max_lag seconds worth of events or ticker_max_count events, whichever comes first.

Now, londiste is the replicating daemon, and in fact is just another kind of PGQ subscriber. So you're running a londiste.py daemon for each subscriber. This londiste daemon is configured to get events from the pgq_queue_name queue, which can be used on as many subscribers as you want to. On the other hand, each subscriber must have a unique job_name.

The queue daemon, pgqadm.py can be hosted on any machine, it'll often be the provider host or a third machine, independant of the replication databases. A single ticker instance can host as many queues as you want it to, but in the case of replication with londiste.py (remember, you could be using your own subscriber code atop PGQ), the queue will need to be installed on the provider database.

Next FAQ Entry

Yes there will be some more, please don't be shy and ask for more if you feel the need. That could even be the source of some documentation editing...

Can I use PGQ alone?

Yes please!

You even have more than one option in term of API to use:

Pick your choice, which mainly depends on your existing code models, or I wouldn't ever have made a PHP layer for it in the first place.

What's about documentation? Where is it, first?

Documentation can be found online on the http://pgfoundry.org/ Skytools project page. With some luck it's even up to date. Or you could simply download the tarball and check the already generated documentation material.

Now, what you'll find is a very thin amount of documentation, almost only a man page per Skytools subprojects. There are two points to keep in mind just before being so disappointed as to consider switching to another solution:

Tips & tricks, troubleshooting

I think I should provide some things in this section.

Adding all provider tables to subscriber

is as easy as this command:

  londiste.py <ini> provider tables | xargs londiste.py <ini> subscriber add

Useful SQL queries

consumers state

This query is to be run at the provider site, it gives some information for each queue and consumer.

  SELECT queue_name, consumer_name, lag, last_seen
    FROM pgq.get_consumer_info();

The lag column is showing this same lag we were talking at the beginning of this document, the last_seen column shows the timestamptz of the latter consumer request. The value of this column should never be more than 60s (and some, it's quick but not instantaneous) in the default configuration.

drop all queue events from a former consumer

When you're playing with londiste you can easily be in a situation were you want to clear up all your installation in order to start again cleanly. Or it could be that a former PGQ consumer has been deprecated in your architecture, and you want PGQ to forget all about it.

For PGQ to stop accumulating events for a never to come back consumer, use the following API:

  SELECT pgq.unregister_consumer('queue_name', 'consumer_name');

More to come

Stay tuned, or ask.

Tutorial

And maybe a new tutorial is to be written, too.

Advanced reading material

Marko gave a talk about PGQ internals.