Monday, July 26 2010
PostgreSQL, Catalogs, YeSQL

Partitioning: relation size per “group”

This time, we are trying to figure out where is the bulk of the data on disk. The trick is that we're using DDL partitioning, but we want a “nice” view of size per partition set. Meaning that if you have for example a parent table foo with partitions foo_201006 and foo_201007, you would want to see a single category foo containing the accumulated size of all the partitions underneath foo.

Here we go:

select groupe, pg_size_pretty(sum(bytes)::bigint) as size, sum(bytes)
  from (
select relkind as k, nspname, relname, tablename, bytes,
         case when relkind = 'r' and relname ~ '[0-9]{6}$'
              then substring(relname from 1 for length(relname)-7)

	      when relkind = 'i' and  tablename ~ '[0-9]{6}$'
              then substring(tablename from 1 for length(tablename)-7)

              else 'core' 
          end as groupe
  from (
  select nspname, relname,
         case when relkind = 'i'
              then (select relname
                      from pg_index x 
                           join pg_class xc on x.indrelid = xc.oid
                           join pg_namespace xn on xc.relnamespace = xn.oid
                     where x.indexrelid = c.oid
              else null
           end as tablename,
         pg_size_pretty(pg_relation_size(c.oid)) as relation,
         pg_total_relation_size(c.oid) as bytes,
    from pg_class c join pg_namespace n on c.relnamespace = n.oid 
   where c.relkind in ('r', 'i') 
         and nspname in ('public', 'archive')
         and pg_total_relation_size(c.oid) > 32 * 1024
order by 5 desc
       ) as s
       ) as t
group by 1
order by 3 desc;

Note that by simply removing those last two lines here, you will get a detailed view of the indexes and tables that are taking the most volume on disk at your place.

Now, what about using window functions here so that we get some better detailed view of historic changes on each partition? With some evolution figure in percentage from the previous partition of the same year, accumulated size per partition and per year, yearly sum, you name it. Here's another one you might want to try, ready for some tuning (schema name, table name, etc):

  select relname, 
         pg_relation_size(c.oid) as size,
         pg_total_relation_size(c.oid) as tsize,
         substring(substring(relname from '[0-9]{6}$') for 4)::bigint as year
    from pg_class c 
         join pg_namespace n on n.oid = c.relnamespace 
   where c.relkind = 'r'
     -- and n.nspname = 'public'
     -- and c.relname ~ 'stats' 
     and substring(substring(relname from '[0-9]{6}$') for 4)::bigint >= 2008
order by relname
     sy AS (
  select relname, 
         (sum(size) over w_year)::bigint as ysize,
         (sum(size) over w_month)::bigint as cumul,
	 (lag(size) over (order by relname))::bigint as previous 
    from s
  window w_year  as (partition by year),
         w_month as (partition by year order by relname)
     syp AS (
  select relname, 
	 rank() over (partition by year order by size desc) as rank,
         case when ysize = 0 then ysize 
	      else round(size / ysize::numeric * 100, 2) end as yp, 
         case when previous = 0 then previous
	      else round((size / previous::numeric - 1.0) * 100, 2) end as evol, 
    from sy
  SELECT relname, 
         pg_size_pretty(size) as size,
         pg_size_pretty(tsize) as "+indexes",
         evol, yp as "% annuel", rank,
         pg_size_pretty(cumul) as cumul, year,
         pg_size_pretty(ysize) as "yearly sum", 
         pg_size_pretty((sum(size) over())::bigint) as total
    FROM syp
ORDER BY relname;

Hope you'll find it useful, I certainly do!