You might have read it in the news already in Citus’ blog post by Sumedh Pathak: PostgreSQL Expert Dimitri Fontaine joins Citus Data. I am very happy to join a talented team here at Citus, and excited to work on an Open Source solution for distributed SQL on-top of PostgreSQL! In this article I’m going to cover my first technical contributions to Citus database, as it happens that a few patches of mine made it to the main source tree already.
TL;DR It’s good to be working on PostgreSQL related source code again, and to have the opportunity to solve PostgreSQL related problems at scale!
After having been involved in many migration projects over the last 10 years, I decided to publish the following White Paper in order to share my learnings.
The paper is titled Migrating to PostgreSQL, Tools and Methodology and details the Continuous Migration approach. It describes how to migrate from another relational database server technology to PostgreSQL. The reasons to do so are many, and first among them is often the licensing model.
As I’ve been organizing the first edition of pgDay Paris back in 2015, this very conference is special to me! It’s been an amazing experience launching a new PostgreSQL conference…
PostgreSQL ships with an interactive console with the command line tool named psql. It can be used both for scripting and interactive usage and is moreover quite a powerful tool. Interactive features includes autocompletion, readline support (history searches, modern keyboard movements, etc), input and output redirection, formatted output, and more.
Florent Fourcot has read Mastering PostgreSQL in Application Development and has seen tremendous inprovements in his production setup from reading the first chapters and applying the book advices to his use case.
Here’s an interview run with Florent where he explains the context in which such improvements has been made!
The Enterprise Edition of Mastering PostgreSQL in Application Development ships with a docker image that hosts both a PostgreSQL server instance with a pre-loaded database, the one that’s used throughout the book examples, and also with a Jupyter Network notebook that hosts SQL queries thanks to the sql_magic plugin.
The PostgreSQL community made the explicit choice some times ago that they would not use the infamous master and slave terminology. Instead, the documentation introduces the concepts of High Availability, Load Balancing, and Replication with the terms Primary and Standby, and the even more generic term Replica is used in contexts when only the data flow is considered, rather than the particular role of a node.
In How to Write SQL we saw how to write
SQL queries as separate
.sql files, and we learnt about using query
parameters with the psql syntax for that (
For writing our database model, the same tooling is all we need. An important aspect of using psql is its capacity to provide immediate feedback, and we can also have that with modeling too.
Our discovery led us to find albums containing tracks of multiple genres, and for the analytics we were then pursuing, we wanted to clean the data set and assign a single genre per album. We did that in SQL of course, and didn’t actually edit the data.
Finding the most frequent input value in a group is a job for the
WITHIN GROUP (ORDER BY sort_expression) Ordered-Set Aggregate Function, as
documented in the PostgreSQL page about Aggregate