In the article from yesterday we talked about PostgreSQL HyperLogLog with some details. The real magic of that extension has been skimmed over though, and needs another very small article all by itself, in case you missed it.

Which Set Operation do you want for counting unique values?
The first query here has the default level of magic in it, really. What happens is that each time we do an update of the HyperLogLog hash value, we update some data which are allowing us to compute its cardinality.
=> select date, #users as daily, pg_column_size(users) as bytes from daily_uniques order by date; date | daily | bytes ------------+------------------+------- 2013-02-22 | 401676.779509985 | 1287 2013-02-23 | 660187.271908359 | 1287 2013-02-24 | 869980.029947449 | 1287 2013-02-25 | 580865.296677817 | 1287 2013-02-26 | 240569.492722719 | 1287 (5 rows)
And has advertized the data is kept in a static sized data structure. The
magic here all happens at hll_add() time, the function you have to call to
update the data.
Now on to something way more magic!

Are those the aggregates you're looking for?
=> select to_char(date, 'YYYY/MM') as month, round(#hll_union_agg(users)) as monthly from daily_uniques group by 1; month | monthly ---------+--------- 2013/02 | 1960380 (1 row)
The HyperLogLog data structure is allowing the implementation of an union
algorithm that will be able to compute how many unique values you happen to
have registered in both one day and the next. Extended in its general form,
and doing SQL, what you get is an aggregate that you can use in GROUP BY
constructs and window functions. Did you read about them yet?
Tags
Previous Articles
- Live Upgrading PGQ Friday, February 08 2013, 15:52
- Marking whole word Friday, February 08 2013, 17:15
- Playing with pgloader Tuesday, February 12 2013, 11:17
- PostgreSQL HyperLogLog Monday, February 25 2013, 10:23
Next Articles
- Emacs Conference Monday, March 04 2013, 13:58
- Batch Update Friday, March 15 2013, 10:47
- Bulk Replication Monday, March 18 2013, 14:54
- The Need For Speed Friday, March 29 2013, 09:49

