Monday, October 14 2013
PostgreSQL, Extensions, plproxy, YeSQL

PostgreSQL Autonomous Transaction

PostgreSQL is an all round impressive Relational DataBase Management System which implements the SQL standard (see the very useful reference page Comparison of different SQL implementations for details). PostgreSQL also provides with unique solutions in the database market and has been leading innovation for some years now. Still, there's no support for Autonomous Transactions within the server itself. Let's have a look at how to easily implement them with PL/Proxy.

In PostgreSQL we have pluggable languages: it's possible to add support for programming languages to write your own stored procedures, and the core server ships with support for 5 such languages: PL/C, PL/SQL, PLpgSQL, PL/perl, PL/python and PL/tcl.

The PL/Proxy procedural language is not about providing an existing programming language. It's about providing the user with remote procedure call and sharding facilities to spread any kind of laod in between a herd of PostgreSQL servers. It's a very good Scaling Out solution, that we're going to use for something quite different here.

Remote Procedure Calls

Now, the main feature PL/proxy provides and that we're going to benefit from today is the remote procedure call facility: when a function is called on a server, proxy it to another one, calling the same function with the same arguments over there, and fetching the result back. Classic proxying and RPC.

PL/Proxy will use binary transfers when possible to avoid marshalling steps here

The main thing to understand with PL/Proxy is that the remote procedure call happens in its own transaction, when it returns it's already been committed on the remote server. So there's no local control over the PL/Proxy transaction, if you need to ROLLBACK locally that's too late.

Well, that limitation actually is a very interesting feature if what you want to obtain is an Autonomous Transaction, because it's actually what it is. If you want the Autonomous Transaction to happen locally, all is needed is to connect the proxy back to the current database.

A practical Use Case: auditing trigger

Autonomous Transactions are useful when several units of processing need to be done, and we want to be able to COMMIT some of those without impacting the main transaction's ability to ROLLBACK.

Note that with the savepoint standard feature it's possible to ROLLBACK a part of the processing while still issuing a COMMIT for the overall transaction, so if that's what you need to do, you're already covered with a stock install of PostgreSQL.

Now, say you want to log any attempt to UPDATE a row in that specific critical table of yours, even if the transaction is then aborted. That's often referenced under the name audit trigger and we already saw how to implement such a trigger in our article Auditing Changes with Hstore.

The whole goal of today's exercise is going to populate our audit table even when the main transaction fails. Let's first see what happens with the solution we had already when we ROLLBACK the main transaction:

~# begin;

~*# update example set f1 = 'b' where id = 1;

~*# rollback;

~# select * from audit;
 change_date | before | after 
(0 rows)

The auditing table is not populated.

Installing PLproxy

It begins as usual:

~# create extension plproxy;

For that command to work you need to have installed the Operating System Package for plproxy (if using PostgreSQL 9.3 under debian you need to install the postgresql-9.3-plproxy package, as found in the PostgreSQL debian repository). If you don't have a package for PL/Proxy you need to fetch the sources from then run make install.

The Setup

Now that we have the extension, we need to use the CREATE SERVER command to have an entry point to a remote transaction on the same connection string.

~# create server local foreign data wrapper plproxy options(p0 'dbname=dim');

~# create user mapping for public server local options(user 'dim');

~# create function test_proxy(i int)
           returns int
          language plproxy
as $$
  cluster 'local';
  select i;

~# select test_proxy(1);
(1 row)

Time: 0.866 ms

So we have a PL/proxy cluster to use, named local, and we tested it with a very simple function that just returns whatever integer we give it as argument. We can see that the overhead to reconnect locally is not daunting in our pretty simple example.

The remote auditing trigger

We already had a trigger function named audit that will work with the magic variables NEW and OLD. What we want now is that the trigger function issues a remote procedure call to our PL/proxy local connection instead:

create function audit_trigger()
  returns trigger
  language plpgsql
as $$
  perform audit_proxy(old, new);
  return new;

create function audit_proxy(old example, new example)
  returns void
  language plproxy
as $$
  cluster 'local';
  target audit;

create or replace function audit(old example, new example)
  returns void
  language SQL
as $$
  INSERT INTO audit(before, after) SELECT hstore(old), hstore(new);   

drop trigger if exists audit on example;

    create trigger audit
      after update on example
          for each row          -- defaults to FOR EACH STATEMENT!
 execute procedure audit_trigger();

What you can see in that new setup is that the trigger calls the function audit_trigger which in turns call the function audit_proxy. That proxy function is the key for us to benefit from the PL/proxy remote transaction management, all the proxy function does is connect back to localhost then call the function named audit with the same parameter it got called with.

The previous example was made quite generic thanks to using hstore. We can't use pseudo types with PL/proxy so we need a pair of functions per table we want to be able to audit in this fashion.

Now here's what happens with those definitions and an aborted UPDATE:

~# begin;

~*# update example set f1 = 'b' where id = 1;

~*# rollback;

~# select change_date, before, after, after-before as diff from audit;
-[ RECORD 1 ]--------------------------------
change_date | 2013-10-14 14:29:09.685105+02
before      | "f1"=>"a", "f2"=>"a", "id"=>"1"
after       | "f1"=>"b", "f2"=>"a", "id"=>"1"
diff        | "f1"=>"b"

The aborted update has been captured in the audit logs!


Thanks to a design where extensibility is a first class citizen, PostgreSQL makes it possible to implement Autonomous Transactions without having to edit its source code. Here we're using a special kind of a Foreign Data Wrapper: the PL/proxy driver allows implementing remote procedure calls and sharding.

If you need Autonomous Transactions and though PostgreSQL might not be suitable in your case, now is the time to review your choice!