Continuing our series of PostgreSQL Data Types today we’re going to introduce the PostgreSQL text data type. The first notion to understand when processing text in any program is of course the notion of encoding.

So when addressing the text datatype we must mention encoding settings, and possibly also issues. An encoding is a particular representation of characters in bits and bytes. In the ASCII encoding the letter A is encoded as the 7-bits byte 1000001, or 65 in decimal, or 41 in hexadecimal. All those numbers are going to be written the same way on-disk, and the letter A too.

Today, we’re going to begin a dive into the PostgreSQL Data Types. As my colleague Will Leinweber said recently in his talk Constraints: a Developer’s Secret Weapon that he gave at pgDay Paris: database constraints in Postgres are the last line of defense.

The most important of those constraints is the data type, or the attribute domain in normalization slang. By declaring an attribute to be of a certain data type, then PostgreSQL ensures that this property is always true, and then implements advanced processing features for each data type, so that you may push the computation to the data, when needed.

This article is the first of a series that will go through many of the PostgreSQL data types, and we open the journey with boolean.

PostgreSQL is the world’s most advanced open source database, and per the PostgreSQL Wikipedia page it is an object-relational database management system (ORDBMS) with an emphasis on extensibility and standards compliance.

In this article, we try to understand why would PostgreSQL be named an object-relational thing. What is Object Oriented Programming and how does that apply to a database system?

In our previous article we saw three classic Database Modelization Anti-Patterns. The article also contains a reference to a Primary Key section of my book Mastering PostgreSQL in Application Development, so it’s only fair that I would now publish said Primary Key section!

So in this article, we dive into Primary Keys as being a cornerstone of database normalization. It’s so important to get Primary Keys right that you would think everybody knows how to do it, and yet, most of the primary key constraints I’ve seen used in database design are actually not primary keys at all.

Current trend in software deployments is to rely on open source software for entire production stacks. You can find open source software in the core technical stacks of every startup out there, I’m told. If you’re using Cloud based offerings, most of Cloud providers are running Free/Libre Open Source Software as their foundation.

This article is a deep dive into the economic models behind successful open source projects and communities, and how as a professional, enterprise grade user, you depend on the long-term sustainability of all the open source projects you’re using. And because you depend on the projects you’re using to be successful, how to contribute and guarantee their success.

Today I want to react to an article that claims that Relational Algebra Is the Root of SQL Problems in which the author hand-waves the following position:

SQL becomes more a hindrance to data manipulation than an efficient tool. SQL’s greatest problem isn’t in the implementation level, but at its theory foundation. The problem can’t be solved by application optimization. Relational algebra isn’t sophisticated enough for handling the complicated data manipulation scenarios.

Then they go on to several arguments from authority to “prove” their point. My reading of the article is that SQL is very hard when you didn’t care to learn it, as most technologies are.

In this article, we’re going to look at the simple examples provided where apparently SQL makes it so much harder to find a solution compared to writing some Java or C++ code. Contrary to the original article, we go as far as to actually writing both the SQL solution and a complete Python solution, so that we can compare.

In another article here, entitled on JSON and SQL, we saw in great details how to import a data set only available as a giant JSON file. Then we normalized the data set, so as to be able to write SQL and process our data. This approach is sometimes very useful and was a good way to learn some of the JSON functions provided by PostgreSQL.

In this article, we’re going to use SQL to export the data from our relational model into a JSON document. The trick that makes it complex in this example is that we have a recursive data model, with a notion of a parent row that exists in the same table as the current one. That’s a nice excuse to learn more about the SQL construct WITH RECURSIVE.

It seems to be usual nowadays to review the previous year, and readers apparently like Top-N Lists — that’s you now, so let’s hope that my understanding works with you too.

Of course 2018 will see its own amount of new and original content added to this blog, with a continuous focus towards how to make the best out of the SQL powerful programming language, and its advanced concurrency semantics.

You might have read it in the news already in Citus’ blog post by Sumedh Pathak: PostgreSQL Expert Dimitri Fontaine joins Citus Data. I am very happy to join a talented team here at Citus, and excited to work on an Open Source solution for distributed SQL on-top of PostgreSQL! In this article I’m going to cover my first technical contributions to Citus database, as it happens that a few patches of mine made it to the main source tree already.

TL;DR It’s good to be working on PostgreSQL related source code again, and to have the opportunity to solve PostgreSQL related problems at scale!

Dimitri Fontaine

PostgreSQL Major Contributor

Open Source Software Engineer

France