In our previous article
Aggregating NBA data, PostgreSQL vs MongoDB we spent
time comparing the pretty new
*MongoDB Aggregation Framework* with the decades
old SQL aggregates. Today, let's showcase more of those SQL aggregates,
producing a nice
*histogram* right from our SQL console.

## PostgreSQL and Mathematics

The other day while giving a
Practical SQL training my attention drifted to
the
`width_bucket`

function available as part of the
Mathematical Functions and Operators PostgreSQL is offering to its fearless
SQL users.

Here's what the documentation says about it:

The function width_bucket(op numeric, b1 numeric, b2 numeric, count int) returns (as an int) the bucket to which operand would be assigned in an equidepth histogram with count buckets, in the range b1 to b2.

For example

`width_bucket(5.35, 0.024, 10.06, 5)`

returns`3`

.

Let's have a look at our dataset from the NBA games and statistics, and get
back to counting
*rebounds* in the
`drb`

field. A preliminary query informs us
that we have stats ranging from 10 to 54 rebounds per team in a single game,
a good information we can use in the following query:

```
select width_bucket(drb, 10, 54, 9), count(*)
from team_stats
group by 1
order by 1;
width_bucket | count
--------------+-------
1 | 52
2 | 1363
3 | 8832
4 | 20917
5 | 20681
6 | 9166
7 | 2093
8 | 247
9 | 20
10 | 1
(10 rows)
```

*We've just calibrated our games in terms of rebounds here*

## Console Histograms

Now, what would it take to actually be able to display the full story right
into our
`psql`

console, for preview before actually integrating a new diagram
in our reporting solution? Turns out it's not very complex.

First, we want to avoid hard coding the range of
*rebounds* we're processing,
so we are going to compute that in a first step. Then we want the
*histogram*
data, which is a ordered list of ranges with a min and a max value and a
*frequency*, which is how many games were recorded with a number or
*rebounds*
within any given
*bucket* range. And last, we want to display something a
little more visual than just a list of numbers:

```
with drb_stats as (
select min(drb) as min,
max(drb) as max
from team_stats
),
histogram as (
select width_bucket(drb, min, max, 9) as bucket,
int4range(min(drb), max(drb), '[]') as range,
count(*) as freq
from team_stats, drb_stats
group by bucket
order by bucket
)
select bucket, range, freq,
repeat('*', (freq::float / max(freq) over() * 30)::int) as bar
from histogram;
bucket | range | freq | bar
--------+---------+-------+--------------------------------
1 | [10,15) | 52 |
2 | [15,20) | 1363 | **
3 | [20,25) | 8832 | *************
4 | [25,30) | 20917 | ******************************
5 | [30,35) | 20681 | ******************************
6 | [35,40) | 9166 | *************
7 | [40,45) | 2093 | ***
8 | [45,50) | 247 |
9 | [50,54) | 20 |
10 | [54,55) | 1 |
(10 rows)
Time: 53.570 ms
```

The query is using the
Common Table Expressions
`WITH`

syntax so that it's
easier to read and understand, then the data-related magic happens in the
`histogram`

CTE. We use the
`width_bucket`

function to get a
*calibration* number
for each of our games, and we
`GROUP BY bucket`

to be able to
* aggregate* the

*min*and

*max*values.

As we're using PostgreSQL though, just having two columns with the min and
max as separate values is not enough, what we actually need is a
discrete range of
*rebounds* for each
*bucket*, hence using the
`int4range`

range
constructor function.

Of course, within the same
`GROUP BY`

aggregation here it's still possible to
*count* the number of games having a
*rebounds* stat within the bucket, defining
the histogram's frequency.

The only remaining step then consists into hacking our way into actually
displaying something
*visual enough* for a quick less-than-1-minute effort of
data crunching, using the
`repeat`

function which is part of
PostgreSQL String Functions and Operators. Note that we're using the
Window Function expression
`max(freq) over()`

to have access the highest
frequency value from each and every result row.

So... Did I mention lately?

**PostgreSQL is YeSQL!**By the way, the whole scripting and data and SQL is available at github/dimitri/nba, and there's an Hacker News entry to comment on the article if you're interested.