Back from the FODESM 2014 Conference, here's the slides I've been using for the Advanced Extension Use Cases talk I gave, based on the ongoing work to be found under the Tour of Extensions index in this web site.
If you're interested into the talk contents, then you might be interested into the following list of articles where I actually did all the work leading to the slides in the above PDF:
Last week some PostgreSQL users, contributors and advocates have organized a really great conference in Stockholm, Sweden, where I had the please to give the following talk:
In our previous article Aggregating NBA data, PostgreSQL vs MongoDB we spent time comparing the pretty new MongoDB Aggregation Framework with the decades old SQL aggregates. Today, let's showcase more of those SQL aggregates, producing a nice histogram right from our SQL console.
When reading the article Crunching 30 Years of NBA Data with MongoDB Aggregation I coulnd't help but think that we've been enjoying aggregates in SQL for 3 or 4 decades already. When using PostgreSQL it's even easy to actually add your own aggregates given the SQL command create aggregate.
Last week I had the pleasure to present two talks at the awesome PostgreSQL Conference Europe. The first one was actually a tutorial about Writing & using Postgres Extensions where we spent 3 hours on what are PostgreSQL Extensions, what you can expect from them, and how to develop a new one. Then I also had the opportunity to present the new version of pgloader in a talk about Migrating from MySQL to PostgreSQL.
In our Tour of Extensions today's article is about advanced tag indexing. We have a great data collection to play with and our goal today is to be able to quickly find data matching a complex set of tags. So, let's find out those lastfm tracks that are tagged as blues and rhythm and blues, for instance.
PostgreSQL is an all round impressive Relational DataBase Management System which implements the SQL standard (see the very useful reference page Comparison of different SQL implementations for details). PostgreSQL also provides with unique solutions in the database market and has been leading innovation for some years now. Still, there's no support for Autonomous Transactions within the server itself. Let's have a look at how to easily implement them with PL/Proxy.
Let's get back to our Tour of Extensions that had to be kept aside for awhile with other concerns such as last chance PostgreSQL data recovery. Now that we have a data loading tool up to the task (read about it in the Loading Geolocation Data article) we're going to be able to play with the awesome ip4r extension from RhodiumToad.
Last Friday I had the chance to be speaking at the Open World Forum in the NewSQL track, where we had lots of interest and excitement around the NoSQL offerings. Of course, my talk was about explaining how PostgreSQL is Web Scale with some historical background and technical examples about what this database engine is currently capable of.
In our ongoing Tour of Extensions we played with earth distance in How far is the nearest pub? then with hstore in a series about trigger, first to generalize Trigger Parameters then to enable us to Auditing Changes with Hstore. Today we are going to work with pg_trgm which is the trigrams PostgreSQL extension: its usage got seriously enhanced in recent PostgreSQL releases and it's now a poor's man Full Text Search engine.
In a previous article about Trigger Parameters we have been using the extension hstore in order to compute some extra field in our records, where the fields used both for the computation and for storing the results were passed in as dynamic parameters. Today we're going to see another trigger use case for hstore: we are going to record changes made to our tuples.
Sometimes you want to compute values automatically at
INSERT time, like for
duration column out of a
start and an
end column, both
timestamptz. It's easy enough to do with a
BEFORE TRIGGER on your table.
What's more complex is to come up with a parametrized spelling of the
trigger, where you can attach the same
stored procedure to any table even
when the column names are different from one another.
There was SQL before window functions and SQL after window functions: that's how powerful this tool is. Being that of a deal breaker unfortunately means that it can be quite hard to grasp the feature. This article aims at making it crystal clear so that you can begin using it today and are able to reason about it and recognize cases where you want to be using window functions.
In our recent article about The Most Popular Pub Names we did have a look at how to find the pubs nearby, but didn't compute the distance in between that pub and us. That's because how to compute a distance given a position on the earth expressed as longitude and latitude is not that easy. Today, we are going to solve that problem nonetheless, thanks to PostgreSQL Extensions.
In a recent article here we've been talking about how do do Batch Updates in a very efficient way, using the Writable CTE features available in PostgreSQL 9.1. I sometime read how Common Table Expressions changed the life of fellow DBAs and developers, and would say that Writable CTE are at least the same boost again.
In a recent article Craig Kerstiens from Heroku did demo the really useful crosstab extension. That function allows you to pivot a table so that you can see the data from different categories in separate columns in the same row rather than in separate rows. The article from Craig is Pivoting in Postgres.
Tonight I had the pleasure to present a talk at the Dublin PostgreSQL User Group using remote technologies. The talk is about how to make the most ouf of PostgreSQL when using SQL as a developer, and tries to convince you to dive into mastering SQL by showing how to solve an application example all in SQL, using window functions and common table expressions.
In this article, we want to find the town with the greatest number of inhabitants near a given location.
Hier se tenait la cinquième édition de la conférence organisée par dalibo, où des intervenants extérieurs sont régulièrement invités. Le thème hier était à la fois clair et très vaste : la performance.
Performance consulting involves some tricks that you have to teach over and over again. One of them is that SQL tends to be so much better at dealing with plenty of rows in a single statement when compared to running as many statements, each one against a single row.
In the article from yesterday we talked about PostgreSQL HyperLogLog with some details. The real magic of that extension has been skimmed over though, and needs another very small article all by itself, in case you missed it.
If you've been following along at home the newer statistics developments,
you might have heard about this new
State of The Art Cardinality Estimation Algorithm called
technique is now available for PostgreSQL in the extension
https://github.com/aggregateknowledge/postgresql-hll and soon
to be in
As Guillaume says, we've been enjoying a great evening conference in Lyon 2 days ago, presenting PostgreSQL to developers. He did the first hour presenting the project and the main things you want to know to start using PostgreSQL in production, then I took the opportunity to be talking to developers to show off some SQL.
I've been given a nice puzzle that I think is a good blog article opportunity, as it involves some thinking and window functions.
Let's say you need to
ALTER TABLE foo ALTER COLUMN bar TYPE bigint;, and
PostgreSQL is helpfully telling you that no you can't because such and such
views depend on the column. The basic way to deal with that is to copy
paste from the error message the names of the views involved, then prepare a
script wherein you first
DROP VIEW ...; then
ALTER TABLE and finally
VIEW again, all in the same transaction.
You certainly know that implementing
dynamic triggers in
impossible. But I had a very bad night, being up from as soon as 3:30 am
today, so that when a developer asked me about reusing the same trigger
function code from more than one table and for a dynamic column name, I
didn't remember about it being impossible.
The drawback of hosting a static only website is, obviously, the lack of
comments. What happens actually, though, is that I receive very few comments
by direct mail. As I don't get another
spam source to cleanup, I'm left
unconvinced that's such a drawback. I still miss the low probability of
seeing blog readers exchange directly, but I think a
list would be my answer, here...
8.4 came out there was all those comments about how getting
window functions was an awesome addition. Now, it seems that a lot of people
seeking for help in
#postgresql just don't know what kind of problem this
feature helps solving. I've already been using them in some cases here in
this blog, for getting some nice overview about
Partitioning: relation size per “group”.
After discovering the excellent
Gwene service, which allows you to subscribe
newsgroups to read
RSS content (
commits, etc), I came to
read this nice article about
Happy Numbers. That's a little problem that
fits well an interview style question, so I first solved it yesterday
Emacs Lisp as that's the language I use the most those days.
The idea of the day ain't directly from me, I'm just helping with a very
thin subpart of the problem. The problem, I can't say much about, let's just
assume you want to reduce the storage of
MD5 in your database, so you want
bit strings. A solution to use them works fine, but the datatype is
still missing some facilities, for example going from and to hexadecimal
representation in text.
This time, we are trying to figure out where is the bulk of the data on
disk. The trick is that we're using
DDL partitioning, but we want a “nice”
view of size per
partition set. Meaning that if you have for example a
foo with partitions
foo_201007, you would want
to see a single category
foo containing the accumulated size of all the
This time we're having a database where sequences were used, but not systematically as a default value of a given column. It's mainly an historic bad idea, but you know the usual excuse with bad ideas and bad code: the first 6 months it's experimental, after that it's historic.