Tuesday, October 16 2012

Prefixes and Ranges

It's been a long time since I last had some time to spend on the prefix PostgreSQL extension and its prefix_range data type. With PostgreSQL 9.2 out, some users wanted me to update the extension for that release, and hinted me that it was high time that I fix that old bug for which I already had a patch.

prefix_range release 1.2.0

I'm sorry it took that long. It's now done, you can have prefix 1.2.0 from https://github.com/dimitri/prefix or if you want a tagged tarball then you can use this link: https://github.com/dimitri/prefix/tarball/v1.2.0.

The changelog is all about fixing an index search bug and updating the package to primarily be an extension for PostgreSQL 9.1 and 9.2. Of course older Major Versions are still supported (all of them since 8.1, but please first consider upgrading PostgreSQL) if you want to install it manually, using the prefix--1.2.0.sql file.

debian package

And thanks to Christoph Berg the debian package is already validated and has reached debian experimental. We don't target sid these days because debian is preparing a new stable release, so there's a freeze. I think. Anyway, take your prefix package from here if you need it: http://packages.debian.org/experimental/postgresql-9.1-prefix.

Range Types

If you step back a little there's an interesting question to answer here. Why isn't prefix_range and PostgreSQL Range Type? Given the names it seems like a pretty good candidate.

Well the thing is that to make a generic range type you need to have a total ordering on the range elements, and a distance function that tells you how far any two elements of a range are one from each other.

When talking about prefixes, I don't see how to do that. The prefix range ['abcd', 'abce') contains an infinity of elements, all the strings that begin with the letters abcd. I guess that coming with an ordering on text is possible, but what if any text element represents a prefix?

I mean that in our case, the elements would be of type prefix, and 'abcd' is a prefix of 'abcdefg'. The question I want to answer is that given a table with prefixes 'abcd', 'abce' and 'abcde' which row in there has the longest prefix matching the literal 'abcdef'.

I'm not seeing how to abuse the Range Types mechanism to implement that, so if you have some ideas please share them!