In a previous article here we saw How to Write SQL in your application code. The main idea in that article is to maintain your queries in separate SQL files, where it’s easier to maintain them. In particular if you want to be able to test them again in production, and when you have to work and rewrite queries.
Naturally, if you consider SQL as code (which I do), then you want all the usual tooling around code. It’s easy to benefit from version control when your SQL files are stored in separate files, so that’s a checked box. It might be less easy to unit test and regression test your queries, you think… In this article we see how to do that!
There are different ways to test your SQL queries, among them in good position we can do Unit Testing and Regression Testing. The Unit Testing is used to run your queries against static arguments and a static data set, and allows to test very basic things. The assumptions that you want to never break, because your code rely on them.
Regression Testing is about ensuring that when you change things, they continue to implement the same contract. In term of SQL, you can either change your data set (adding columns, altering data types, etc) or your queries. If you run the same query against a different schema or a modified data set that should be compatible it might be named Integrity Testing.
Unit Testing SQL
For Unit Testing in PostgreSQL the perfect tool is pgTap: pgTAP is a suite of database functions that make it easy to write TAP-emitting unit tests in psql scripts or xUnit-style test functions. The TAP output is suitable for harvesting, analysis, and reporting by a TAP harness, such as those used in Perl applications.
When using pgTap, see the relation-testing functions for implementing result set based unit tests. From the documentation, let’s pick a couple example, testing against static result sets as VALUES:
SELECT results_eq(
'SELECT * FROM active_users()',
$$
VALUES (42, 'Anna'),
(19, 'Strongrrl'),
(39, 'Theory')
$$,
'active_users() should return active users'
);
and ARRAYS:
SELECT results_eq(
'SELECT * FROM active_user_ids()',
ARRAY[ 2, 3, 4, 5]
);
As you can see your unit tests are coded in SQL too. Which means you have all the SQL power at your hands to write tests, and also that you can check your schema integrity directly in SQL, using PostgreSQL catalog functions.
So check out pgTap to implement PostgreSQL Unit Tests.
RegreSQL
In my article How to Write SQL the main take out is that your SQL queries are to be managed in .sql files, just like in the following example:
-- name: list-albums-by-artist
-- List the album titles and duration of a given artist
select album.title as album,
sum(milliseconds) * interval '1 ms' as duration
from album
join artist using(artistid)
left join track using(albumid)
where artist.name = :'name'
group by album
order by album;
Such a query is expected to be processed either by psql
directly thanks to
its
psql variables support,
or by a library like
Clojure’s YeSQL or
Python’s anosql for using in your
application’s code base. Implementations exist in other programming
languages too of course, so check out if your favorite is listed already at
the main YeSQL GitHub’s page.
In the put your money where your mouth is department, allow me to unveil the small and neat tool RegreSQL:
$ go get github.com/dimitri/regresql
$ regresql --help
Run regression tests for your SQL queries
Usage:
regresql [command]
Available Commands:
help Help about any command
init Initialize regresql for use in your project
list list candidates SQL files
plan Creates missing plans for new queries
test Run regression tests for your SQL queries
update Creates or updates the expected output files
The idea of RegreSQL is to make it easy to run Regression Tests against SQL queries that you manage with a YeSQL like library. In a local sample application using the query shown above, we can use RegreSQL to run our queries and check that their result is the expected one:
$ regresql test
Connecting to 'postgres:///chinook?sslmode=disable'… ✓
TAP version 13
ok 1 - src/sql/album-by-artist.1.out
ok 2 - src/sql/album-tracks.1.out
ok 3 - src/sql/artist.1.out
ok 4 - src/sql/genre-topn.top-3.out
ok 5 - src/sql/genre-topn.top-1.out
ok 6 - src/sql/genre-tracks.out
To be able to do that, we need parameters to associate with our queries.
RegreSQL calls that a test plan, and you can create your test plans with
the command regresql init
or regresql plan
if you’re already initialized
a repository and just added some queries…
If we take the regresql/plans/src/sql/genre-topn.yaml plan file as an example, it contains two sets of parameters for the src/sql/genre-topn.sql query. The YAML file contains plans. Each plan has a name, then a list of key and values. The keys are the SQL parameter names, and the values are the actual values we want to test our queries with. Here’s the genre-topn.yaml content:
"top-3":
"n": "3"
"top-1":
"n": "1"
The RegreSQL tool creates empty plan files with a single plan named “1”, and prefills it with empty string values for all the query parameters, so that you only have to fill-in the actual values for testing.
Use regresql update
to run the queries and register their output as the
expected ouput. In the following usage example, we did regresql update
with "n": "1"
in our plan file, but then changed our mind and replaced it
with "n": "2"
before running regresql test
:
$ regresql test
Connecting to 'postgres:///chinook?sslmode=disable'… ✓
TAP version 13
ok 1 - src/sql/album-by-artist.1.out
ok 2 - src/sql/album-tracks.1.out
# Query File: 'src/sql/artist.sql'
# Bindings File: 'regresql/plans/src/sql/artist.yaml'
# Bindings Name: '1'
# Query Parameters: 'map[n:2]'
# Expected Result File: 'regresql/expected/src/sql/artist.1.out'
# Actual Result File: 'regresql/out/src/sql/artist.1.out'
#
# --- regresql/expected/src/sql/artist.1.out
# +++ regresql/out/src/sql/artist.1.out
# @@ -1,4 +1,5 @@
# - name | albums
# -------------+-------
# -Iron Maiden | 21
# + name | albums
# +-------------+-------
# +Iron Maiden | 21
# +Led Zeppelin | 14
#
not ok 3 - src/sql/artist.1.out
ok 4 - src/sql/genre-topn.top-3.out
ok 5 - src/sql/genre-topn.top-1.out
ok 6 - src/sql/genre-tracks.out
We can see that the query returns the 2 most prolific artists of our collection rather than the expected single most prolific artist. If that’s your new requirements for the query, then update the expected result. If not, then fix your query!
Conclusion
SQL is code. We need a productive and complete tooling around it, and most of this tooling is already available. It’s quite easy to fill-in the gaps, as soon as you start considering that SQL is code.