pg_basebackup
Hannu just gave me a good idea in
this email on
-hackers, proposing that
pg_basebackup should get the
xlog
files again and again in a loop for the
whole duration of the
base backup. That’s now done in the aforementioned
tool, whose options got a little more useful now:
Usage: pg_basebackup.py [-v] [-f] [-j jobs] "dsn" dest
Options:
-h, --help show this help message and exit
--version show version and quit
-x, --pg_xlog backup the pg_xlog files
-v, --verbose be verbose and about processing progress
-d, --debug show debug information, including SQL queries
-f, --force remove destination directory if it exists
-j JOBS, --jobs=JOBS how many helper jobs to launch
-D DELAY, --delay=DELAY
pg_xlog subprocess loop delay, see -x
-S, --slave auxilliary process
--stdin get list of files to backup from stdin
Yeah, as implementing the
xlog
idea required having some kind of
parallelism, I built on it and the script now has a
--jobs
option for you to
setup how many processes to launch in parallel, all fetching some
base backup
files in its own standard (
libpq
)
PostgreSQL connection, in
compressed chunks of
8 MB
(so that’s not
8 MB
chunks sent over).
The
xlog
loop will fetch any
WAL
file whose
ctime
changed again,
wholesale. It’s easier this way, and tools to get optimized behavior already
do exist, either
walmgr or
walreceiver.
The script is still a little
python self-contained short file, it just went
from about
100
lines of code to about
400
lines. There’s no external
dependency, all it needs is provided by a standard python installation. The
problem with that is that it’s using
select.poll()
that I think is not
available on windows. Supporting every system or adding to the dependencies,
I’ve been choosing what’s easier for me.
import select
p = select.poll()
p.register(sys.stdin, select.POLLIN)
If you get to try it, please report about it, you should know or easily discover my email!