Thursday, May 16, 2013

Disk page checksums to detect filesystem failures in PostgreSQL 9.3Beta 1

New feature introduced in PostgreSQL 9.3Beta 1 i.e. "Disk page checksums". Thanks to author Ants Aasama and Simon Riggs, Jeff Davis,Greg Smith.

In earlier releases, if there's any data corruption block on disk it was silently ignored until any pointer arrives on it or some wrong results shown by the queries. Now, data corruption detected beforehand by triggering WARNING message instead of silently using or waiting for hit on the corrupted block.

Disk page checksums feature implementation is unique,  its not plug-able like EXTENSIONs its selectable feature. That's if you need your database should be under monitory umbrella of data corruption then it should be enabled at the time of cluster initialization not on existing or running cluster. Below's the example how it works.

Initialize the cluster with checksums:
initdb [OPTION]... [DATADIR]
   ........
   -k, --data-checksums      use data page checksums
   
initdb -D data_directory -k 
Now, any data corruption found will be notified as below:
postgres=# select * from corruption_test;
WARNING:  page verification failed, calculated checksum 63023 but expected 48009
ERROR:  invalid page in block 0 of relation base/12896/16409
In earlier version,just an error message.
postgres=# select * from corruption_test where id=1;
ERROR:  invalid page header in block 0 of relation base/12870/18192
That's cool right....

So, how do you know whether disk page checksums enabled on the cluster or not ?
As of now, there's no pg_catalog to store such information or any files created in the $PGDATA directory, only pg_control file will hold that information. Using pg_controldata utility you can know about it.
$ export PGDATA=/usr/local/pg93beta/data
$ pg_controldata 
....
....
....
Data page checksum version:           1
Some points on Disk page checksums:
1. Temp tables are excluded from checksums checks.
2. There's performance overhead if checksums enabled as per the PG documentation.
3. Once enabled checksums on a cluster cannot be rolled back.

Thanks
Raghav

2 comments:

Anonymous said...

According to the commit, the primary developer for the feature was Ants Aasama, and the work was reviewed by Jeff Davis and Simon Riggs. You should probably update your post to accurately reflect this.

Raghavendra said...

Sorry, thanks for correcting :). I have updated the blog accordingly.

Post a Comment