The idea of the day ain't directly from me, I'm just helping with a very
thin subpart of the problem. The problem, I can't say much about, let's just
assume you want to reduce the storage of MD5 in your database, so you want
to abuse bit strings. A solution to use them works fine, but the datatype is
still missing some facilities, for example going from and to hexadecimal
representation in text.
create or replace function hex_to_varbit(h text) returns varbit language sql as $$ select ('X' || $1)::varbit; $$; create or replace function varbit_to_hex(b varbit) returns text language sql as $$ select array_to_string(array_agg(to_hex((b << (32*o))::bit(32)::bigint)), '') from (select b, generate_series(0, n-1) as o from (select $1, octet_length($1)/4) as t(b, n)) as x $$;
To understand the magic in the second function, let's walk through the tests
one could do when wanting to grasp how things work in the bitstring world
(using also some reading of the fine documentation, too).
=# select ('101011001011100110010110'::varbit << 0)::bit(8);
bit
----------
10101100
(1 row)
=# select ('101011001011100110010110'::varbit << 8)::bit(8);
bit
----------
10111001
(1 row)
=# select ('101011001011100110010110'::varbit << 16)::bit(8);
bit
----------
10010110
(1 row)
=# select * from *TEMP VERSION OF THE FUNCTION FOR TESTING*
o | b | x
---+----------------------------------+----------
0 | 10101100101111010001100011011011 | acbd18db
1 | 01001100110000101111100001011100 | 4cc2f85c
2 | 11101101111011110110010101001111 | edef654f
3 | 11001100110001001010010011011000 | ccc4a4d8
(4 rows)
What do we get from that, will you ask? Let's see a little example:
=# select hex_to_varbit(md5('foo'));
hex_to_varbit
----------------------------------------------------------------------------------------------------------------------------------
10101100101111010001100011011011010011001100001011111000010111001110110111101111011001010100111111001100110001001010010011011000
(1 row)
=# select md5('foo'), varbit_to_hex(hex_to_varbit(md5('foo')));
md5 | varbit_to_hex
----------------------------------+----------------------------------
acbd18db4cc2f85cedef654fccc4a4d8 | acbd18db4cc2f85cedef654fccc4a4d8
(1 row)
Storing varbits rather than the text form of the MD5 allows us to go from
6510 MB down to 4976 MB on a sample table containing 100 millions
rows. We're targeting more that that, so that's a great win down here!
In case you wonder, querying the main index on varbit rather than the one on
text for a single result row, the cost of doing the conversion with
varbit_to_hex seems to be around 28 µs. We can afford it.
Hope this helps!
