In another article here, entitled on JSON and SQL, we saw in great details how to import a data set only available as a giant JSON file. Then we normalized the data set, so as to be able to write SQL and process our data. This approach is sometimes very useful and was a good way to learn some of the JSON functions provided by PostgreSQL.
In this article, we’re going to use SQL to export the data from our relational model into a JSON document. The trick that makes it complex in this example is that we have a recursive data model, with a notion of a parent row that exists in the same table as the current one. That’s a nice excuse to learn more about the SQL construct WITH RECURSIVE.
Table of Contents
The context in which I’ve been asked for help on this topic is unusual. A friend of mine is preparing a campaign in his current favorite Role Playing Game, and for that taking notes of Non Playing Characters. The game makes it complex enough that my friend wants to build some dynamic visualisation of the data, as in the excellent tool Zoomable Circle Packing example.
Dungeons & Dragons Character Classes
So we’re going to play in the same domain, in a way, by modeling Dungeons & Dragons Character Classes, as in the following data loading SQL file:
--
-- https://en.m.wikipedia.org/wiki/Character_class_(Dungeons_%26_Dragons)
--
begin;
create table dndclasses
(
id int generated by default as identity primary key,
parent_id int references dndclasses(id),
name text
);
-- INSERT commands
-- ...
commit;
This data set looks like in the following query result, which is the ouput
of a very simple TABLE dndclasses;
SQL command:
id │ parent_id │ name
════╪═══════════╪═══════════════════════════
1 │ ¤ │ Warrior
2 │ ¤ │ Wizard
3 │ ¤ │ Priest
4 │ ¤ │ Rogue
5 │ 1 │ Fighter
6 │ 1 │ Paladin
7 │ 1 │ Ranger
8 │ 2 │ Mage
9 │ 2 │ Specialist wizard
10 │ 3 │ Cleric
11 │ 3 │ Druid
12 │ 3 │ Priest of specific mythos
13 │ 4 │ Thief
14 │ 4 │ Bard
15 │ 13 │ Assassin
(15 rows)
I’ve been trying to make sense of the Wikipedia pages for the character classes and I had to pick a specific edition and extend it with a prestige class (the Assassin), so I hope fans of the game that are reading this article will accept this classification for its pedagogic interest…
Exporting a Hierarchy in JSON
Given this data set, our goal is to obtain a single JSON file that we can hand over to the d3js JavaScript library. Ideally, we would just run a single SQL query and the result would be a piece of JSON to hand-over to the browser so that the client rendering can now happen.
The result we need now should look something like this:
[
{
"Name": "Wizard",
"Sub Classes": ["Mage", "Specialist wizard"]
}
{
"Name": "Priest",
"Sub Classes": ["Cleric", "Druid", "Priest of specific mythos"]
}
{
"Name": "Warrior",
"Sub Classes": ["Fighter", "Paladin", "Ranger"]
}
{
"Name": "Rogue",
"Sub Classes": ["Thief", "Bard"]
}
{
"Name": "Rogue",
"Sub Classes": {
"Name": "Thief", "Sub Classes": ["Assassin"]
}
}
]
To be able to export our whole data set as a single JSON document, we need to recurse over it from the parents classes to their sub-classes. Not only that, we also need to accumulate the sub-classes into a single entity.
Now, it’s easy enough in SQL to run either from the set of top-level
classes, because we know they have NULL as their parent_id
, or from a
given class entry up the chain to its top-level ancestor, because we know
how to select a single row from its id
.
In this case though, we want to walk through our class graph from the top-level classes and still see from there all the sub-classes that are nested under the top-level entry. That’s more complex and can’t be done in a single step, obviously: you can’t have seen the bottom of the tree already when you start at its top.
Recursion’s First Step
The first step begins at the top of the tree, with those classes that don’t have a parent class, our top-level classes:
select id, name, '{}'::int[] as parents, 0 as level
from dndclasses
where parent_id is NULL
We have added computed columns to the base data set: both a parent array, built empty for now; and a recursion level, zero for the top-level items.
The PostgreSQL syntax to build an empty array of integers may look strange to you if you’re not used to it yet. It’s of course well documented in the Array Value Input section of the documentation.
Here’s our initial result set:
id │ name │ parents │ level
════╪═════════╪═════════╪═══════
1 │ Warrior │ {} │ 0
2 │ Wizard │ {} │ 0
3 │ Priest │ {} │ 0
4 │ Rogue │ {} │ 0
(4 rows)
WITH RECURSIVE
Now that we have a set of top-level classes, we want to add to the set their direct sub-classes, then loop over that extended set to find the next level of sub-classes, until we find no sub-classes any more. Well that’s exactly what WITH RECURSIVE is all about, automatically discovering how many steps need to be done:
with recursive dndclasses_from_parents as
(
select id, name, '{}'::int[] as parents, 0 as level
from dndclasses
where parent_id is NULL
union all
select c.id, c.name, parents || c.parent_id, level+1
from dndclasses_from_parents p
join dndclasses c
on c.parent_id = p.id
where not c.id = any(parents)
)
select name, id, parents, level
from dndclasses_from_parents;
Before trying to explain the query, let’s have a look at its result, in order for the more visuals of you to get plenty of hints already:
name │ id │ parents │ level
═══════════════════════════╪════╪═════════╪═══════
Warrior │ 1 │ {} │ 0
Wizard │ 2 │ {} │ 0
Priest │ 3 │ {} │ 0
Rogue │ 4 │ {} │ 0
Fighter │ 5 │ {1} │ 1
Paladin │ 6 │ {1} │ 1
Ranger │ 7 │ {1} │ 1
Mage │ 8 │ {2} │ 1
Specialist wizard │ 9 │ {2} │ 1
Cleric │ 10 │ {3} │ 1
Druid │ 11 │ {3} │ 1
Priest of specific mythos │ 12 │ {3} │ 1
Thief │ 13 │ {4} │ 1
Bard │ 14 │ {4} │ 1
Assassin │ 15 │ {4,13} │ 2
(15 rows)
A recursive query is written in two parts. The first part is executed only once and fetches our initial data set. The second part of the query is then executed and is allowed to reference the result of the query itself. That’s why it is recursive.
As an example, it is possible to define of the English term ancestor in a recursive fashion:
Your ancestors are your parents and their ancestors.
The trick is actually very simple: a recursive definition is a definition that uses its own term in its definition. Here, to define what is an ancestor we refer to your parents, and then their ancestors, which is the term being defined…
Back to our DnD character classes. Once the first arm of the union all
query is done, we have a set of data that we can refer to by the relation
name dndclasses_from_parents
. That’s what we do in the second arm of the
union all construct, to find all the rows having as parent one of the rows
we already have selected.
The magic of the WITH RECURSIVE form is that the second arm of the union all query is done repeatedly. At each step, a Work Table is built by running this recursive query part, and PostgreSQL only stops when the Work Table is empty. In our case, when there’s no subclasses to be found anymore.
In the recursive term of the query, we add new entries from the base table, and we also maintain our local state: the computed columns parents (an array of id values) and level (an integer that increments at each step).
When using PostgreSQL, the concatenation operator works on text values and
on array values too, so you can append a new item in an existing array using
the ||
SQL operator. That’s how we maintain our parents
array in the query.
Graph Cycles and Infinite Recursion
If the data set is not a Directed Acyclic Graph, you might have cycles in your data. Here it would mean that a sub-class would be found both in the above and below another class in the chart, which would likely not be intended. The cycle detection and prevention is done thanks to the following WHERE clause to the second part of the union all query:
where not c.id = any(parents)
From parents to children node, and back
Now, this result is very nice, but it’s not what we’ve been asked to deliver, if you remember correctly. Our quest consists of delivering a single JSON document listing every class and their sub-class as nested JSON document entries.
To do that, we need to recurse from the sub-classes up to their parents, so that at each level we are in a position to accumulate all the sub-classes into the single JSON document result, the top-level accumulation producing our query result.
It would be somehow bad news if we had to retrieve the result of our previous query into our application’s memory, only to send the data back to PostgreSQL in order to continue our processing.
Hopefully, PostgreSQL is well capable of daisy chaining a second RECURSIVE query using the result of the first. And the syntax for that is just what you would expect, another WITH part to the query.
In the following SQL query we use the result of the
dndclasses_from_parents
part to find the leaf nodes in our tree: classes
without sub-classes. That’s where we want to begin our bottom-up tree
traversal to build our JSON document:
with recursive dndclasses_from_parents as
(
-- Classes with no parent, our starting point
select id, name, '{}'::int[] as parents, 0 as level
from dndclasses
where parent_id is NULL
union all
-- Recursively find sub-classes and append them to the result-set
select c.id, c.name, parents || c.parent_id, level+1
from dndclasses_from_parents p
join dndclasses c
on c.parent_id = p.id
where not c.id = any(parents)
),
dndclasses_from_children as
(
-- Now start from the leaf nodes and recurse to the top-level
-- Leaf nodes are not parents (level > 0) and have no other row
-- pointing to them as their parents, directly or indirectly
-- (not id = any(parents))
select c.parent_id,
json_agg(jsonb_build_object('Name', c.name))::jsonb as js
from dndclasses_from_parents tree
join dndclasses c using(id)
where level > 0 and not id = any(parents)
group by c.parent_id
union all
-- build our JSON document, one piece at a time
-- as we're traversing our graph from the leaf nodes,
-- the bottom-up traversal makes it possible to accumulate
-- sub-classes as JSON document parts that we glue together
select c.parent_id,
jsonb_build_object('Name', c.name)
|| jsonb_build_object('Sub Classes', js) as js
from dndclasses_from_children tree
join dndclasses c on c.id = tree.parent_id
)
-- Finally, the traversal being done, we can aggregate
-- the top-level classes all into the same JSON document,
-- an array.
select jsonb_pretty(jsonb_agg(js))
from dndclasses_from_children
where parent_id IS NULL;
In this query, the dndclasses_from_parents
is the RECURSIVE query we saw
in the previous section already, and the dndclasses_from_children
then
walks up the tree aggregating classes as JSON documents. The outer query
then has to deal with as many rows as we have top-level classes, in our case
those are the Warrior, Wizard, Priest, and Rogue classes. We build a
JSON array to group them into a single valid document:
[
{
"Name": "Wizard",
"Sub Classes": [
{
"Name": "Mage"
},
{
"Name": "Specialist wizard"
}
]
}
{
"Name": "Priest",
"Sub Classes": [
{
"Name": "Cleric"
},
{
"Name": "Druid"
},
{
"Name": "Priest of specific mythos"
}
]
}
{
"Name": "Warrior",
"Sub Classes": [
{
"Name": "Fighter"
},
{
"Name": "Paladin"
},
{
"Name": "Ranger"
}
]
}
{
"Name": "Rogue",
"Sub Classes": [
{
"Name": "Thief"
},
{
"Name": "Bard"
}
]
}
{
"Name": "Rogue",
"Sub Classes": {
"Name": "Thief",
"Sub Classes": [
{
"Name": "Assassin"
}
]
}
}
]
Ta-dah! Job done! It’s now possible to pass the result over to the very nice visual and interactive Zoomable Circle Packing tool and enjoy a browsable data set. That’s left as an exercize to the reader though!
Conclusion
SQL is a very powerful declarative programming language. Tree traversal is covered in SQL thanks to WITH RECURSIVE, a standard feature that appeared in SQL-1999 and is available in many RDBMS. When coupled with PostgreSQL support for arrays and JSON, it’s a very useful feature to master!
If you liked this article and want to learn more SQL features, discover PostgreSQL advanced processing tools, and write less code, then consider reading my book! The Art of PostgreSQL teaches SQL to application developers!