Simple Case for Pivoting in SQL
In a recent article Craig Kerstiens from Heroku did demo the really useful crosstab extension. That function allows you to pivot a table so that you can see the data from different categories in separate columns in the same row rather than in separate rows. The article from Craig is Pivoting in Postgres.
Let’s do the same setup as he did, with a table containing some randomly generated data about hypothetical visits to a web page, say, by date then by operating system.
create table daily_visits_per_os as select date::date, b.desc AS TYPE, (random() * 10000 + 1)::int AS val from generate_series((now() - '100 days'::interval)::date, now()::date, '1 day'::interval) as t(date), (SELECT unnest(ARRAY['OSX', 'Windows', 'Linux']) AS DESC) b;
And we can see those lines with the following SQL:
table daily_visits_per_os limit 12;
date | type | val ------------+---------+------ 2013-03-26 | OSX | 1583 2013-03-26 | Windows | 3075 2013-03-26 | Linux | 848 2013-03-27 | OSX | 4377 2013-03-27 | Windows | 7685 2013-03-27 | Linux | 9550 2013-03-28 | OSX | 3063 2013-03-28 | Windows | 7920 2013-03-28 | Linux | 2760 2013-03-29 | OSX | 1873 2013-03-29 | Windows | 8123 2013-03-29 | Linux | 866 (12 rows)
TABLEis a real SQL statement from the standard!
Now that we have some data to play with, what we want is the number of visits per os as different columns, having a result with 4 columns: the date, the number of visits using OSX that day, then using Windows, then using Linux. How to do that in plain SQL?
select date, sum(case when type = 'OSX' then val end) as osx, sum(case when type = 'Windows' then val end) as windows, sum(case when type = 'Linux' then val end) as linux from daily_visits_per_os group by date order by date limit 4;
date | osx | windows | linux ------------+------+---------+------- 2013-03-26 | 1583 | 3075 | 848 2013-03-27 | 4377 | 7685 | 9550 2013-03-28 | 3063 | 7920 | 2760 2013-03-29 | 1873 | 8123 | 866 (4 rows)
The other advantage of that solution —
apart from being standard SQL — is
that if you happen to have more than a single row per group for a given
category (after all, the data definition is not normalized here, it’s known
EAV and you want to stay away from that as much as possible), then the
sum(case) query will work just fine.
Starting with PostgreSQL 9.4 it’s also possible to write the query using the
FILTER syntax as in the following:
select date, sum(val) filter(where type = 'OSX') as osx, sum(val) filter(where type = 'Windows') as windows, sum(val) filter(where type = 'Linux') as linux from daily_visits_per_os group by date order by date limit 4;