Monday, March 18 2013
PostgreSQL, Skytools, Londiste

Bulk Replication

In the previous article here we talked about how to properly update more than one row at a time, under the title Batch Update. We did consider performances, including network round trips, and did look at the behavior of our results when used concurrently.

A case where we want to apply the previous article approach is when replicating data with a trigger based solution, such as SkyTools and londiste. Well, maybe not in all cases, we need to have a amount of UPDATE trafic worthy of setting up the solution. As soon as we know we're getting to replay important enough batches of events, though, certainly using the batch update tricks makes sense.

It so happens that londiste 3 includes the capability to use handlers. Those are plugins written in python (like all the client side code from SkyTools) whose job is to handle the processing of the event batches. Several of them are included in the londiste sources, and one of them is named bulk.py.

Bulk loading data with londiste

To use set in londiste.ini:

handler_modules = londiste.handlers.bulk

then add table with one of those commands:

londiste3 add-table xx --handler="bulk"
londiste3 add-table xx --handler="bulk(method=X)"

The default method is 0, and the available methods are the following:

correct ( 0) - inserts as COPY into table - update as COPY into temp table and single UPDATE from there - delete as COPY into temp table and single DELETE from there

delete ( 1) - as correct, but update are done as DELETE then COPY

merged (2) - as delete, but merge insert rows with update rows

Conclusion

Yes, by using that handler which is provided by default in londiste, you will apply the previous article tricks in your replication solution. And you can even choose to use that for only some of the tables you are replicating.