pg_restore -L & pg_staging
On the
PostgreSQL Hackers mailing lists,
Andrew Dunstan just proposed some
new options for
pg_dump and
pg_restore to ease our lives. One of the
answers was talking about some scripts available to exploit the
pg_restore
listing that you play with using options
-l and
-L, or the long name
versions
--list and
--use-list. The
pg_staging tool allows you to easily
exploit those lists too.
The
pg_restore list is just a listing of one object per line of all objects
contained into a
custom dump, that is one made with
pg_dump -Fc. You can
then tweak this listing in order to comment out some objects (prepending a
;
to the line where you find it), and give your hacked file back to
pg_restore --use-list so that it will skip them.
What’s pretty useful here, among other things, is that a table will have in
fact more than one line in the listing. One is for the
TABLE definition,
another one for the
TABLE DATA. So that
pg_staging is able to provide you
with options for only restoring some
schemas, some
schemas_nodata and even
some
tablename_nodata_regexp, to use directly the configuration options
names.
How to do a very simple exclusion of some table’s data when restoring a dump, will you ask me? There we go. Let’s first prepare an environment, where I have only a PostgreSQL server running.
$ git clone git://github.com/dimitri/pg_staging.git
$ git clone git://github.com/dimitri/pgloader.git
$ for s in */*.sql; do psql -f $s; done
$ pg_dump -Fc > pgloader.dump
Now I have a dump with some nearly random SQL objects in it, let’s filter
out the tables named
reformat and
parallel from that. We will take the
sample setup from the
pg_staging project. Going the quick route, we will
not even change the default sample database name that’s used, which is
postgres. After all, the
catalog command of
pg_staging that we’re using
here is a
developer command, you’re supposed to be using
pg_staging for a
lot more services that just this one.
$ cp pg_staging/pg_staging.ini .
$ (echo "schemas = public";
echo "tablename_nodata_regexp = parallel,reformat") \
>> pg_staging.ini
$ echo "catalog postgres pgloader.dump" \
| python pg_staging/pg_staging.py -c pg_staging.ini
; Archive created at Mon Aug 29 17:17:49 2011
;
; [EDITED OUTPUT]
;
; Selected TOC Entries:
;
3; 2615 2200 SCHEMA - public postgres
1864; 0 0 COMMENT - SCHEMA public postgres
1536; 1259 174935 TABLE public parallel dimitri
1537; 1259 174943 TABLE public partial dimitri
1538; 1259 174951 TABLE public reformat dimitri
;1853; 0 174935 TABLE DATA public parallel dimitri
1854; 0 174943 TABLE DATA public partial dimitri
;1855; 0 174951 TABLE DATA public reformat dimitri
1834; 2606 174942 CONSTRAINT public parallel_pkey dimitri
1836; 2606 174950 CONSTRAINT public partial_pkey dimitri
1838; 2606 174955 CONSTRAINT public reformat_pkey dimitri
We can see that the objects indeed are skipped, now how to really go about
the
pg_restore is like that:
$ createdb foo
$ echo "catalog postgres pgloader.dump" \
|python pg_staging/pg_staging.py -c pg_staging.ini > short.list
$ pg_restore -L short.list -d foo pgloader.dump
The little bonus with using
pg_staging is that when filtering out a
schema
it will track all tables and triggers from that schema, and also the
functions used in the trigger definition. Which is not as easy as it
sounds, believe me!
The practical use case is when filtering out
PGQ and
Londiste, then the
PGQ
triggers will automatically be skipped by
pg_staging rather than polluting
the
pg_restore logs because the
CREATE TRIGGER command could not find the
necessary implementation procedure.