Hannu just gave me a good idea in this email on -hackers, proposing that pg_basebackup should get the xlog files again and again in a loop for the whole duration of the base backup. That’s now done in the aforementioned tool, whose options got a little more useful now:
Usage: pg_basebackup.py [-v] [-f] [-j jobs] "dsn" dest Options: -h, --help show this help message and exit --version show version and quit -x, --pg_xlog backup the pg_xlog files -v, --verbose be verbose and about processing progress -d, --debug show debug information, including SQL queries -f, --force remove destination directory if it exists -j JOBS, --jobs=JOBS how many helper jobs to launch -D DELAY, --delay=DELAY pg_xlog subprocess loop delay, see -x -S, --slave auxilliary process --stdin get list of files to backup from stdin Yeah, as implementing the xlog idea required having some kind of parallelism, I built on it and the script now has a --jobs option for you to setup how many processes to launch in parallel, all fetching some base backup files in its own standard ( libpq) PostgreSQL connection, in compressed chunks of 8 MB (so that’s not 8 MB chunks sent over).
