Implementing backups
I’ve been asked about my opinion on backup strategy and best practices, and it so happens that I have some kind of an opinion on the matter.
I tend to think best practice here begins with defining properly the backup plan you want to implement. It’s quite a complex matter, so be sure to ask yourself about your needs: what do you want to be protected from?
The two main things to want to protect from are hardware loss (crash
disaster, plane in the data center, fire, water flood, etc) and human error
(
UPDATE
without a where clause). Replication is an answer to the former,
archiving and dumps to the latter. You generally need both.
Often enough “backups” include
WAL
archiving and
shipping and nightly or
weekly
base backups, with some retention and some scripts or procedures
ready to setup
Point In Time Recovery and recover some data without
interfering with the WAL archiving and shipping. Of course with PostgreSQL
9.0 and 9.1, the
WAL Shipping can be implemented with
streaming replication
and you can even have a
Hot Standby. But for backups you still want
archiving.
Mostly I still implement
pg_dump -Fc
nightly backups with a custom retention
(for example, 1 backup a month kept 2 years, 1 backup a week kept 6 or 12
months, 1 backup a night kept 1 to 2 weeks), when the database size allow
the
pg_dump
run to remain constrained in the
maintenance window, if any.
Don’t forget that while
pg_dump
runs, you can’t roll out
DDL changes to the
production system any more, so you want to be careful about this
maintenance window thing. When you have one.
Physical backups are not locking rollouts away, but they often suck a good deal of the IO bandwidth so you need to pick up a right timing to do them. That’s how you can get to once a week base backup and WAL archiving.
If you can’t
pg_dump
production, maybe you can have
automated restore jobs
from the
physical backups that you then
pg_dump -Fc
, so that you still have
that. That can come up handy, really: you can’t test your
major upgrade out
of a
physical backup.
Also, obviously, never consider your backup strategy implemented until you have either automated restores in place or a regular schedule to exercise them ( staging instances, devel instances).
Then as far as the practical tools go, I tend to think that
pg_staging is
worth its installation complexity, and for WAL archiving and base backup I
recommend
walmgr from
Skytools, that’s a very handy tool. When using
PostgreSQL
9.0
or
9.1
, consider using
walmgr3 so that it’s behaving nice
alongside
streaming replication.