pgloader is a tool to help loading data into PostgreSQL, adding some error
management to the COPY command. COPY is the fast way of loading data into
PostgreSQL and is transaction safe. That means that if a single error
appears within your bulk of data, you will have loaded none of it. pgloader
will submit the data again in smaller chunks until it's able to isolate the
bad from the good, and then the good is loaded in.

Not quite this kind of data loader
In a recent migration project where we freed data from MySQL into
PostgreSQL, we used pgloader again. But the loading time was not fast enough
for the service downtime window that we had here. Indeed Python is not known
for being the fastest solution around. It's easy to use and to ship to
production, but sometimes you not only want to be able to be efficient when
writing code, you also need the code to actually run fast too.
Faster data loading
So I began writing a little dedicated tool for that migration in Common Lisp which is growing on me as my personal answer to the burning question: python 2 or python 3? I find Common Lisp to offer an even more dynamic programming environment, an easier language to use, and the result often has performances characteristics way beyond what I can get with python. Between 5 times faster and 121 times faster in some quite stupid benchmark.
Here, with real data, my one shot attempt has been running more than twice as fast as the python version, after about a day of programming.

See what's happening now?
The other thing here is that I've tempted to get pgloader work in parallel,
but at the time I didn't know about the Global Interpreter Lock that they
didn't find how to remove in Python 3 still, by the way. So my threading
attempts at making pgloader work in parallel are pretty useless.
Whereas in Common Lisp I can just use the lparallel lib, which exposes threading facilities and some queueing facilities as a mean to communicate data in between workers, and have my code easily work in parallel for real.
Compatibility
The only drawback that I can see here is that if you've been writing your
own reformating modules in python for pgloader (yes you can
implement your own reformating module for pgloader), then you would have to
port it to Common Lisp. Shout me an email if that's your case.
Next version
So, I think we're going to have a pgloader 3 someday, that will be way
faster than the current one, and bundle some more features: real parallel
behavior, ability to fetch non local data (connecting to MySQL directly, or
HTTP, S3, etc); and I'm thinking about offering a COPY like syntax to drive
the loading too, while at it. Also, the ability to discover the set of data
to load all by itself when you want to load a whole database: think of it as
a special Migration mode of operations.
Some feature requests can't be solved easily when keeping the old .INI
syntax cruft, so it's high time to implement some kind of a real command
language. I have several ideas about those, in between the COPY syntax and
the SQL*Loader configuration format, which is both clunky and quite
powerful, too.
After a beginning in TCL and a complete rewrite in python in 2005, it looks
like 2013 is going to be the year of pgloader 3, in Common Lisp!
Tags
PostgreSQL Common-Lisp Python pgloader lparallel
Previous Articles
- Inline Extensions Thursday, December 13 2012, 11:34
- Extensions Templates Tuesday, January 08 2013, 17:53
- Lost in scope Wednesday, January 09 2013, 11:07
- Automated Setup for pgloader Thursday, January 17 2013, 14:32
Next Articles
- FOSDEM 2013 Tuesday, January 29 2013, 10:11
- A Sunday at FOSDEM Wednesday, January 30 2013, 10:50
- Another Great FOSDEM Monday, February 04 2013, 09:55
- Live Upgrading PGQ Friday, February 08 2013, 15:52

