Hannu just gave me a good idea in this email on -hackers, proposing that
pg_basebackup should get the xlog files again and again in a loop for the
whole duration of the base backup. That's now done in the aforementioned
tool, whose options got a little more useful now:
Usage: pg_basebackup.py [-v] [-f] [-j jobs] "dsn" dest
Options:
-h, --help show this help message and exit
--version show version and quit
-x, --pg_xlog backup the pg_xlog files
-v, --verbose be verbose and about processing progress
-d, --debug show debug information, including SQL queries
-f, --force remove destination directory if it exists
-j JOBS, --jobs=JOBS how many helper jobs to launch
-D DELAY, --delay=DELAY
pg_xlog subprocess loop delay, see -x
-S, --slave auxilliary process
--stdin get list of files to backup from stdin
Yeah, as implementing the xlog idea required having some kind of
parallelism, I built on it and the script now has a --jobs option for you to
setup how many processes to launch in parallel, all fetching some base
backup files in its own standard (libpq) PostgreSQL connection, in
compressed chunks of 8 MB (so that's not 8 MB chunks sent over).
The xlog loop will fetch any WAL file whose ctime changed again,
wholesale. It's easier this way, and tools to get optimized behavior already
do exist, either walmgr or walreceiver.
The script is still a little python self-contained short file, it just went
from about 100 lines of code to about 400 lines. There's no external
dependency, all it needs is provided by a standard python installation. The
problem with that is that it's using select.poll() that I think is not
available on windows. Supporting every system or adding to the dependencies,
I've been choosing what's easier for me.
import select
p = select.poll()
p.register(sys.stdin, select.POLLIN)
If you get to try it, please report about it, you should know or easily discover my email!
