SQL Regression Tests

In a previous article here we saw How to Write SQL in your application code. The main idea in that article is to maintain your queries in separate SQL files, where it’s easier to maintain them. In particular if you want to be able to test them again in production, and when you have to work and rewrite queries.

Naturally, if you consider SQL as code (which I do), then you want all the usual tooling around code. It’s easy to benefit from version control when your SQL files are stored in separate files, so that’s a checked box. It might be less easy to unit test and regression test your queries, you think… In this article we see how to do that!

Contents

There are different ways to test your SQL queries, among them in good position we can do Unit Testing and Regression Testing. The Unit Testing is used to run your queries against static arguments and a static data set, and allows to test very basic things. The assumptions that you want to never break, because your code rely on them.

Regression Testing is about ensuring that when you change things, they continue to implement the same contract. In term of SQL, you can either change your data set (adding columns, altering data types, etc) or your queries. If you run the same query against a different schema or a modified data set that should be compatible it might be named Integrity Testing.

Unit Testing SQL

For Unit Testing in PostgreSQL the perfect tool is pgTap: pgTAP is a suite of database functions that make it easy to write TAP-emitting unit tests in psql scripts or xUnit-style test functions. The TAP output is suitable for harvesting, analysis, and reporting by a TAP harness, such as those used in Perl applications.

When using pgTap, see the relation-testing functions for implementing result set based unit tests. From the documentation, let’s pick a couple example, testing against static result sets as VALUES:

SELECT results_eq(
    'SELECT * FROM active_users()',
    $$
      VALUES (42, 'Anna'),
             (19, 'Strongrrl'),
             (39, 'Theory')
    $$,
    'active_users() should return active users'
);

and ARRAYS:

SELECT results_eq(
    'SELECT * FROM active_user_ids()',
    ARRAY[ 2, 3, 4, 5]
);

As you can see your unit tests are coded in SQL too. Which means you have all the SQL power at your hands to write tests, and also that you can check your schema integrity directly in SQL, using PostgreSQL catalog functions.

So check out pgTap to implement PostgreSQL Unit Tests.

RegreSQL

In my article How to Write SQL the main take out is that your SQL queries are to be managed in .sql files, just like in the following example:

-- name: list-albums-by-artist
-- List the album titles and duration of a given artist
  select album.title as album,
         sum(milliseconds) * interval '1 ms' as duration
    from album
         join artist using(artistid)
         left join track using(albumid)
   where artist.name = :'name'
group by album
order by album;

Such a query is expected to be processed either by psql directly thanks to its psql variables support, or by a library like Clojure’s YeSQL or Python’s anosql for using in your application’s code base. Implementations exist in other programming languages too of course, so check out if your favorite is listed already at the main YeSQL GitHub’s page.

In the put your money where your mouth is department, allow me to unveil the small and neat tool RegreSQL:

$ go get github.com/dimitri/regresql

$ regresql --help
Run regression tests for your SQL queries

Usage:
  regresql [command]

Available Commands:
  help        Help about any command
  init        Initialize regresql for use in your project
  list        list candidates SQL files
  plan        Creates missing plans for new queries
  test        Run regression tests for your SQL queries
  update      Creates or updates the expected output files

The idea of RegreSQL is to make it easy to run Regression Tests against SQL queries that you manage with a YeSQL like library. In a local sample application using the query shown above, we can use RegreSQL to run our queries and check that their result is the expected one:

$ regresql test
Connecting to 'postgres:///chinook?sslmode=disable'… ✓
TAP version 13
ok 1 - src/sql/album-by-artist.1.out
ok 2 - src/sql/album-tracks.1.out
ok 3 - src/sql/artist.1.out
ok 4 - src/sql/genre-topn.top-3.out
ok 5 - src/sql/genre-topn.top-1.out
ok 6 - src/sql/genre-tracks.out

To be able to do that, we need parameters to associate with our queries. RegreSQL calls that a test plan, and you can create your test plans with the command regresql init or regresql plan if you’re already initialized a repository and just added some queries…

If we take the regresql/plans/src/sql/genre-topn.yaml plan file as an example, it contains two sets of parameters for the src/sql/genre-topn.sql query. The YAML file contains plans. Each plan has a name, then a list of key and values. The keys are the SQL parameter names, and the values are the actual values we want to test our queries with. Here’s the genre-topn.yaml content:

"top-3":
  "n": "3"
"top-1":
  "n": "1"

The RegreSQL tool creates empty plan files with a single plan named “1”, and prefills it with empty string values for all the query parameters, so that you only have to fill-in the actual values for testing.

Use regresql update to run the queries and register their output as the expected ouput. In the following usage example, we did regresql update with "n": "1" in our plan file, but then changed our mind and replaced it with "n": "2" before running regresql test:

$ regresql test
Connecting to 'postgres:///chinook?sslmode=disable'… ✓
TAP version 13
ok 1 - src/sql/album-by-artist.1.out
ok 2 - src/sql/album-tracks.1.out
# Query File: 'src/sql/artist.sql'
# Bindings File: 'regresql/plans/src/sql/artist.yaml'
# Bindings Name: '1'
# Query Parameters: 'map[n:2]'
# Expected Result File: 'regresql/expected/src/sql/artist.1.out'
# Actual Result File: 'regresql/out/src/sql/artist.1.out'
# 
# --- regresql/expected/src/sql/artist.1.out
# +++ regresql/out/src/sql/artist.1.out
# @@ -1,4 +1,5 @@
# -   name     | albums
# -------------+-------
# -Iron Maiden | 21
# +    name     | albums
# +-------------+-------
# +Iron Maiden  | 21
# +Led Zeppelin | 14
#  
not ok 3 - src/sql/artist.1.out
ok 4 - src/sql/genre-topn.top-3.out
ok 5 - src/sql/genre-topn.top-1.out
ok 6 - src/sql/genre-tracks.out

We can see that the query returns the 2 most prolific artists of our collection rather than the expected single most prolific artist. If that’s your new requirements for the query, then update the expected result. If not, then fix your query!

Conclusion

SQL is code!

SQL is code. We need a productive and complete tooling around it, and most of this tooling is already available. It’s quite easy to fill-in the gaps, as soon as you start considering that SQL is code.