Monday, May 2, 2011

PostgreSQL 9.0 Backup & Recovery


In PostgreSQL, Backup & Recovery are very user friendly comparing with other database. Many of them won't agree to this, ok lets not get into debate. Coming to Backups, PostgreSQL does'nt support INCREMENTAL BACKUP, however there are very consistant backup tools and OS level work-around to achieve this goal.

My pictorial presentation on PostgreSQL Backup and Recovery gives a complete conceptial idea. Looking into Diagram you can make out which backups can be used to restore or recover.

Logical Backup

pg_dump,pg_restore and pg_dumpall utilities used for logical backups. pg_dump and pg_restore will help in taking backups for Database level, Schema level and Table Level. Pg_dumpall used for cluster level dump.

Three formats are supported with pg_dump, Plain SQL format, Custom Format and Tar Format. Custom and Tar Format dumps are compatible with pg_restore utility whereas Plain SQL format dumps are compatible with psql utility for restoration.

Below are the examples for each Backup levels and with related restore commands.

Note: Set the defaults for PGDATABASE, PGUSER, PGPASSWORD and PGPORT in .bash_profile(Environment Variables in Windows)

Plain SQL Format Dump and Restore
$ pg_dump -U username -Fp dbname  >  filename
or
$ pg_dump -U username dbname -f  filename
or
$ pg_dump -Fp -U username dbname -f  filename

For restoring use psql command

$ psql -U username -f filename dbname
or
postgres=# \i SQL-file-name     //in psql terminal with \i option
Custom Format
$ pg_dump -Fc dbname -f filename
$ pg_restore -Fc -U username -d dbname filename.dmp
Tar Format
$ pg_dump -Ft dbname -f filename
$ pg_restore -U username -d dbname filename
or
$ cat tar-file.tar | psql -U username dbname
Note: Schema Level and Tables Level dumps can be performed in the same way by adding related options.

Cluster Level Dump:
$pg_dumpall -p portnumber > filename

For restoring use psql command

$ psql -f filename
There are very best way of taking dumps and restoring methodolgies. In particular, Simon Riggs and Hannu Krosing - "PostgreSQL 9 Administration Cookbook - 2010" book is good way to start with PostgreSQL Backup and Recovery published by www.2ndQuadrant.com.

Physical Backup (File system Backup)

Cold Backup:

In cold backup, its a simple file system backup of /data directory when Postgres Instance is down, mean, to achieve a self-consistent data directory backup, the database server should be shut down before copying. PostgreSQL gives flexibility to keep pg_xlog and pg_tblspce in different mount points via softlink. While copying the /data directory including the soft link's data, use the below command.
tar czf backup.tar.gz $PGDATA
or
cp -r $PGDATA /backup/
or
rsync -a $PGDATA /wherever/data

Hot Backup (Online Backup):

In Hot Backup, cluster will be up and running and the Database should be in Archive Log Mode. Two system functions will notify the instance about starting and stopping the Hot Backup process(pg_start_backup(),pg_stop_backup()). Before going forward with Online Backup, let's discuss on the Database Archive Log mode which is mandatory for Online Backups.

Enabling WAL Archiving:

Coming posts of mine will brief about PITR / Tunning WAL etc., presently we look into WAL Archiving. In PostgreSQL database system, the actual database 'writes' to an addition file called write-ahead log (WAL) to disk. It contains a record of writes that made in the database system. In the case of Crash, database can be repaired/recovered from these records.

Normally, the write-ahead log logs at regular intervals (called Checkpoints) matched against the database and then deleted because it no longer is required. You can also use the WAL as a backup because,there is a record of all writes made to the database.

Concept of WAL Archiving:

The write-ahead log is composed of each 16 MB large, which are called segments. The WALs reside under pg_xlog directory and it is the subdirectory of 'data directory'. The filenames will have numerically named in ascending order by PostgreSQL Instance. To perform a backup on the basis of WAL, one needs a basic backup that is, a complete backup of the data directory, and the WAL Segments between the base backup and the current date.

Configuring the archiving of WAL segments can be chosen by setting the two configuration parameter's archive_command and archive_mode in the postgresql.conf. Making the cluster into Archive-log mode requires RESTART.
archive_mode= on/off (boolean parameter)
archive_command = 'cp –i %p / Archive/Location/ f% '
Note: % p for the file to copy with path used as a file name and % f without a directory entry for the destination file.

For further information about the Archiver Process, refer to the post PostgreSQL 9.0 Memory & Processess.

Online Backup :

To take online backup:
Step 1 : Issue pg_start_backup('lable') in the psql terminal
postgres=# select pg_start_backup('fb');
Step 2 : OS level copy the $PGDATA directory to any Backup Location
$ cp -r $PGDATA  /anylocation
Step 3 : Issue pg_stop_backup() in psql terminal.
postgres=# select pg_stop_backup();
Note: It is not necessary that these two functions should run in the same database connection. The backup mode is global and persistent.

In PostgreSQL, there is no catalog to store the Start and Stop time of the Online backup. However, when online backup is in process, couple of the files created and deleted.

pg_start_backup('label') and pg_stop_backup are the two system functions to perform the Online Backup. With pg_start_backup('label') a file backup_label is created under $PGDATA directory and with pg_stop_backup() a file 'wal-segement-number.backup' file created under $PGDATA/pg_xlog. Backup_label will give the start time and Checkpoint location of WAL Segment, it will also notify the PostgreSQL instance that Cluster is in BACKUP-MODE. 'wal-segment-number.backup' file under $PGDATA/pg_xlog directory describes the start and stop time, Checkpoint location with WAL segement number.

Note: After pg_stop_backup(), backup_label file is deleted by the PostgreSQL instance.

Do post your comments, suggestions.

--Raghav


6 comments :

Rohit said...

Hello Raghavendra ,

I was looking at your PG related blog, and it is quite nice.

Btw I am lead architect for a major energy management platform development, and we are looking for a Postgres DBA.

If you are interested and wish to know more, please do mail me at rohit dot sri at gmail dot com.

Thanks.

Anonymous said...

I have the feeling that Postgres is proceeding towards Oracle, but at the very early stage of Oracle, like 8. for example, the replication function of Postgres, like Oracle standby in 8i, but not as good as Oracle. still a lot of manual steps required and not so straightward.

Jean-Christophe Arnu said...

Maybe you can also talk of continuous archiving starting from a PITR and using archives to get as close as the restoration point you want. This is done by archive_command key in postgresql.conf

Anonymous said...

Hey nice writeup, helped me to understand a little better.

Question, what if have only a Base backup from 6 months ago and no WAL logs.. Can I restore from this Base backup successfully? This is something I can find no documentation on.

Raghavendra said...

You can restore any consistent base backup irrespective of at what time the backup has taken. Question only comes, if you want to restore the base backup and bring to consistent state of your current production, then you need WAL's to apply and do PITR. You can refer to my PITR section of blog for your reference. Thank you for visiting.

Unknown said...

Good explanation, thanks. But as for me, I don't like a command line interface, so I prefer this free simple GUI tool for the PostgreSQL Backup which allow to schedule the backup and store it on a various cloud storage(Dropbox, Google Drive, Amazon S3) or locally. http://postgresql-backup.com/

Post a Comment

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License