Kris Jenkins cooked up a very nice way
to embed SQL in your
code: YeSQL for Clojure. The main
idea is that you should be writing your SQL queries in
.sql files in your
code repository and maintain them there.
The idea is very good and it is now possible to find alternative implementations of the Clojure yesql library in other languages. Today, we are going to have a look at one of them for the python programming language: anosql.
SQL is code
When you have to write SQL code, several options are available. One of them is to consider that SQL is nothing more than a static literal string embedded in your code, and use the tools that your programming language provide for that, e.g. PHP Heredoc or python triple-quoted strings.
Then, often enough you also want to integrate dynamic parts in your query string… and now either you concatenate string parts and variables or you need a templating facility of sorts. Or maybe you’re thinking that your ORM offers the perfect solution to that problem, allowing you to never resort to writing raw SQL yourself…
In any case, a time will come when you need to debug a query that runs in production, and now you need to find it again in your source code and edit it. When you’re lucky enough, you have a SQL savvy guy in the team, maybe even a DBA, and so you’re analyzing the query using Explain and rewriting it to solve your problem, maybe an efficiency related one.
How easy it is for you to find the problematic query in your code, and then to replace it with the new version you just came up with? In a lot of cases, the new version will look nothing like the previous one. It might even use constructs that your ORM knows nothing about.
Let’s see an example query that works against
the Chinook database. This
database models a digital media store, including tables for artists, albums,
media tracks, invoices, and customers. We have a table
track with a
milliseconds column and we want to display both the time in a proper human
readable form and the percentage of each track duration with respect to the
total duration of the album:
select name as title, milliseconds * interval '1ms' as duration, round( milliseconds / sum(milliseconds) over () * 100, 2) as pct from track where albumid = :id order by trackid;
And here’s the output for
:id = 1:
title | duration | pct -----------------------------------------+----------------------+------- For Those About To Rock (We Salute You) | @ 5 mins 43.719 secs | 14.32 Put The Finger On You | @ 3 mins 25.662 secs | 8.57 Let's Get It Up | @ 3 mins 53.926 secs | 9.75 Inject The Venom | @ 3 mins 30.834 secs | 8.78 Snowballed | @ 3 mins 23.102 secs | 8.46 Evil Walks | @ 4 mins 23.497 secs | 10.98 C.O.D. | @ 3 mins 19.836 secs | 8.33 Breaking The Rules | @ 4 mins 23.288 secs | 10.97 Night Of The Long Knives | @ 3 mins 25.688 secs | 8.57 Spellbound | @ 4 mins 30.863 secs | 11.28 (10 rows)
Say I saved my query in a
album.sql file and want to execute it against
the particular album I know the id of:
psql --variable "id=1" -f album.sql chinook
Of course, it is also possible to create and edit the variables interactively so instead I could have done the following from within a psql session:
> \cd path/to/my/sources > \set id 1 > \i album.sql
The psql console being handy, you have autocompletion when entering the
file paths in both the
So it is now quite easy to have your SQL query opened in your favorite editor and a terminal window with the interactive psql console wherein you can easily try your query with different values of your variables.
Dynamically building SQL queries
In some cases, your code will use almost the same query in different places, and it’s easy to want to reduce duplication. Then what happens is that you have conditional code to build the query string with entirely optional clauses: a where clause might be omitted in some cases, or maybe even a join. My advice is simple: don’t do that.
In particular, it is possible to use the
CASE clause within your
conditions if you need to, and the PostgreSQL documentation shows the
following example that also shows the evaluation rules of the clause:
SELECT ... WHERE CASE WHEN x <> 0 THEN y/x > 1.5 ELSE false END;
So my advice is to Keep It Simple and have a SQL file for each main set of conditions. Then in your application code, you can pick the SQL file you need depending on the situation. You might have some SQL code duplication when doing so, that’s true and it is the main cons argument against doing so. On the pros side though you have achieved better modularity: it is now really easy to fix that query you discover being problematic in your production logs. And it is even dead simple to replay and explain the query interactively, either on your developer environment or even in production if necessary (it is, sometimes).
Integrating SQL code in python with anosql
Kris Jenkin’s yesql makes it easy
to implement your SQL in a
query.sql file and then expose your queries as
functions. It is now possible to do so in python too thanks
to anosql, which is available
Here’s an example of what the
album.sql file would look like with AnoSQL
-- name: list-tracks-by-albumid -- List the tracks of an album, includes duration and position select name as title, milliseconds * interval '1ms' as duration, round( milliseconds / sum(milliseconds) over () * 100, 2) as pct from track where albumid = :id order by trackid;
The main difference with the previous example is that we added a couple of documentation strings. The first comment line gives a name to the function, and the second one a python docstring. Here’s how you would use such a query file in your python code now:
import anosql import psycopg2 import sqlite3 import argparse import sys class chinook(object): """Our database model and queries""" def __init__(self, pgconnstring = "dbname=chinook application_name=cdstore"): self.pgconn = psycopg2.connect(pgconnstring) self.genre = anosql.load_queries('postgres', 'genre.sql') self.artist = anosql.load_queries('postgres', 'artist.sql') self.album = anosql.load_queries('postgres', 'album.sql') def genre_list(self): return self.genre.tracks_by_genre(self.pgconn) def genre_top_n(self, n): return self.genre.genre_top_n(self.pgconn, n=n) def artist_by_albums(self, n): return self.artist.top_artists_by_album(self.pgconn, n=n) def album_details(self, albumid): return self.album.list_tracks_by_albumid(self.pgconn, id=albumid) def foo(albumid): db = chinook() for (title, duration, pct) in db.album_details(albumid): ... do something here ...
The anosql library automatically exposes python functions with kwargs support for your queries, so that not only do you get to write proper SQL in the good way™ — which means in SQL files that you can version control — but also, all you have to do in your code is call functions or methods. And I guess you already know how to do that!
You might have seen that I use application_name=cdstore in my connection string. You can then see it in the PostgreSQL activity system view and in the logs. This very simple thing helps tremendously when you want to relate production activity with SQL embedded in your application’s code. Do it now, and be as granular as you can (module names, class names, package names, etc).
One aspect of using SQL to its full capacity is having a productive editing environment with the same level of support you have when writing in another programming language, and more. With psql you also get an interactive REPL console to play with your code and adjust it.
Compared to using concatenated strings sprinkled all over your code, I don’t suppose I have to explain the benefits. Compared with an ORM, it means you have control of the SQL queries you send to your production server and are now actually able to have a productive talk and optimization session with your friendly DBA. He might as well just send you pull requests with better versions of your SQL, and if you already implement peer review you will learn a lot very fast!
This topic is an important one and you will find more about How to Write SQL in the book I am currently writing: Mastering PostgreSQL in Application Development!