An SQL Injection is a security breach, one made famous by the Exploits of
a Mom xkcd
comic episode in which we read about
little Bobby Tables:
PostgreSQL implements a protocol level facility to send the static SQL query text separately from its dynamic arguments. An SQL injection happens when the database server is mistakenly led to consider a dynamic argument of a query as part of the query text. Sending those parts as separate entities over the protocol means that SQL injection is no longer possible.
Table of Contents
PostgreSQL protocol: PQExecParam
The PostgreSQL protocol is fully documented and you can read more about
extended query support on the Message
Flow
documentation page. Also relevant is the PQexecParams
driver API,
documented as part of the command execution
functions
of the libpq
PostgreSQL C driver.
A lot of PostgreSQL application drivers are based on the libpq C driver,
which implements the PostgreSQL protocol and is maintained alongside the
main server’s code. Some drivers variants also exist that don’t link to any
C runtime, in which case the PostgreSQL protocol has been implemented in
another programming language. That’s the case for variants of the JDBC
driver, and the pq
Go driver too, among others.
It is advisable that you read the documentation of your current driver and understand how to send SQL query parameters separately from the main SQL query text; this is a reliable way to never have to worry about SQL injection problems ever again.
In particular, never build a query string by concatenating your query arguments directly into your query strings, i.e. in the application client code. Never use any library, ORM or another tooling that would do that. When building SQL query strings that way, you open your application code to serious security risk for no reason.
PostgreSQL protocol: server-side prepared statements
Another way to send the query string and its arguments separately on the
wire is to use server-side prepared statements. This is a pretty common way
to do it, mostly because PQexecParams
isn’t well known, though it made its
debut in PostgreSQL 7.4, released November 17, 2003. To this day, a lot of
PostgreSQL drivers still don’t expose this facility.
Server-side Prepared Statements can be used in SQL thanks to the PREPARE and EXECUTE commands syntax, as in the following example:
prepare foo as
select date, shares, trades, dollars
from factbook
where date >= $1::date
and date < $1::date + interval '1 month'
order by date;
PREPARE
And then you can execute the prepared statement with a parameter that way,
still at the psql
console:
execute foo('2010-02-01');
If you’re curious about it, here’s the result we have when running this query:
date │ shares │ trades │ dollars
════════════╪════════════╪═════════╪═════════════
2010-02-01 │ 1558526732 │ 5633190 │ 43290463362
2010-02-02 │ 1768180556 │ 6148888 │ 48391414625
2010-02-03 │ 1603665758 │ 5693174 │ 44986991925
2010-02-04 │ 2213497823 │ 7717240 │ 60148012581
2010-02-05 │ 2427569880 │ 8905315 │ 65664171455
2010-02-08 │ 1613044351 │ 5812392 │ 43592103468
2010-02-09 │ 1935306014 │ 7027904 │ 50413934490
2010-02-10 │ 1553714023 │ 5733271 │ 40915973371
2010-02-11 │ 1648721018 │ 5939464 │ 44934557649
2010-02-12 │ 2130203765 │ 6159665 │ 69545693638
2010-02-16 │ 1617687910 │ 5258883 │ 45638709582
2010-02-17 │ 1523567498 │ 5207224 │ 40810393758
2010-02-18 │ 1432125288 │ 4953840 │ 40105345403
2010-02-19 │ 1556863679 │ 4807694 │ 45236985452
2010-02-22 │ 1386189749 │ 4807423 │ 40330077452
2010-02-23 │ 1609958052 │ 5682556 │ 44853459493
2010-02-24 │ 1552246071 │ 5405469 │ 42994120717
2010-02-25 │ 1766446801 │ 6100559 │ 49503093455
2010-02-26 │ 1781712668 │ 5197619 │ 49390248716
(19 rows)
Now, while it’s possible to use the prepare and execute SQL commands directly in your application code, it is also possible to use it directly at the PostgreSQL protocol level. This facility is named Extended Query and is well documented.
Reading the documentation about the protocol implementation, we see the following bits. First the PARSE message:
In the extended protocol, the frontend first sends a Parse message, which contains a textual query string, optionally some information about data types of parameter placeholders, and the name of a destination prepared-statement object […]
Then, the BIND message:
Once a prepared statement exists, it can be readied for execution using a Bind message. […] The supplied parameter set must match those needed by the prepared statement.
Finally, to receive the result set the client needs to send a third message, the EXECUTE message. The details of this part aren’t relevant to this blog post though.
It is very clear from the documentation excerpts above that the query string parsed by PostgreSQL doesn’t contain the parameters. The query string is sent in the BIND message. The query parameters are sent in the EXECUTE message. When doing things that way, it is impossible to have SQL injections.
Remember: SQL injection happens when the SQL parser is fooled into believing that a parameter string is in fact a SQL query, and then the SQL engine goes on and executes that SQL statement. When the SQL query string lives in your application code, and the user-supplied parameters are sent separately on the network, there’s no way that the SQL parsing engine might get confused.
A good example in Python, using asyncpg
The following example uses the asyncpg PostgreSQL driver. It’s open source and the sources are available at the MagicStack/asyncpg repository, where you can browse the code and see that the driver implements the PostgreSQL protocol itself, and uses server-side prepared statements.
The following piece of code is an example of using the asyncpg driver. The example is safe from SQL injection by design, because the server-side prepared statement protocol sends the query string and its arguments in separate protocol messages.
Here’s some client application code:
async def fetch_month_data(year, month):
"Fetch a month of data from the database"
date = datetime.date(year, month, 1)
sql = """
select date, shares, trades, dollars
from factbook
where date >= $1::date
and date < $1::date + interval '1 month'
order by date;
"""
pgconn = await asyncpg.connect(CONNSTRING)
stmt = await pgconn.prepare(sql)
res = {}
for (date, shares, trades, dollars) in await stmt.fetch(date):
res[date] = (shares, trades, dollars)
await pgconn.close()
return res
When you’re using a proper SQL integration
mechanism such as
anosql, it becomes very easy to both work
on your queries interactively at the psql
prompt then integrate them in
your code base, or use them directly from the code base in the psql
interactive REPL. That includes collaborating easily with your DBA.
A wrong example in Python, using psycopg2
In the following example We are using the
psycopg Python driver. Psycopg is based on the
PostgreSQL C implementation of the client-server protocol, libpq
. The
documentation of this driver addresses passing parameters to SQL
queries
right from the beginning.
Psycopg is not making good use of the functionality we just described, and
our factbook-month.py
program above makes use of the %s
syntax for SQL
query arguments. The interpolating of the query arguments is done on the
client side by the psycopg code, and a full query string is then sent to the
PostgreSQL server:
def fetch_month_data(year, month):
"Fetch a month of data from the database"
date = "%d-%02d-01" % (year, month)
sql = """
select date, shares, trades, dollars
from factbook
where date >= date %s
and date < date %s + interval '1 month'
order by date;
"""
pgconn = psycopg2.connect(CONNSTRING)
curs = pgconn.cursor()
curs.execute(sql, (date, date))
res = {}
for (date, shares, trades, dollars) in curs.fetchall():
res[date] = (shares, trades, dollars)
return res
So while it looks like the code is doing the right thing by passing the arguments separately from the query string in the Python code, as seen in the following line, you still need to trust Psycopg2 to be free of any SQL injection faults.
# This looks fine. The interpolation still happens client side. Oops.
curs.execute(sql, (date, date))
Conclusion
SQL Injection is still a pretty serious problem in modern software development. It’s a huge security risk, and it’s surprisingly easy to protect your applications from this risk forever, at least when using PostgreSQL.
At the protocol level, PostgreSQL knows how to receive the SQL query strings
separately from the sql parameters, often supplied dynamically by your
users. The PostgreSQL protocol even comes with two different ways to
implement that, either using the PQexecParams method from libpq
or the
Extended Query message flow when doing server-side prepared statements: the
famous parse/bind/execute sequence of messages.
Say no to SQL injection today! Check your driver’s implementation and how you’re using it in your application code, and make it so that parameters are sent separately from the query strings. Then breathe easily and have a good week-end folks!