Tuesday, January 13, 2015

Configuring Automatic failover using Replication Manager 2.0 on PostgreSQL 9.3.5

In PostgreSQL high availability(Streaming Replication/Hot Standby), one of the thing require human interference and has no automation, that is in the event of master database crash; initiating failover procedure(trigger file creation or pg_ctl promote command) on the standby. Because in core of PostgreSQL there's no such built-in functionality to identify a failure of master and notify the standby. Thus, we require some automated tools to take over manual failover work or we have to dive into scripting land for writing our own script to do it.

Today, we have very good external tools to handle automatic failover like Replication Manager(repmgr),  EDB Failover Manager(EFM),  pgHA and HandyRep. Thanks to all for filling the gap of automatic failover in PostgreSQL.

In this post, am demonstrating Replication Manager on single node(localhost) on RHEL 6.5 - PostgreSQL 9.3.5. In order to achieve an easy and good understanding of concept I have compiled repmgr with EnterpriseDB One Click Installer(a pre-build binary package) instead of PG source.

To compile repmgr, we need to install few mandatory dependency packages gcc, postgresql-devel, libxslt-devel, pam-devel, libopenssl-devel, krb5-devel and libedit-devel by using yum or rpm. After installing dependencies, download repmgr 2.0 from here and set pg_config in your path and start compiling.
[postgres@localhost:/opt/PostgreSQL/9.3~]$ type pg_config
pg_config is hashed (/opt/PostgreSQL/9.3/bin/pg_config)

export PATH=/opt/PostgreSQL/9.3/bin:$PATH
export LD_LIBRARY_PATH=/opt/PostgreSQL/9.3/lib:$LD_LIBRARY_PATH

tar xvzf repmgr-2.0.tar.gz
cd repmgr
make USE_PGXS=1
make USE_PGXS=1 install
Mostly, repmgr compiles smoothly without any hiccups if we have installed all dependency packages, since am compiling against PG pre-build binaries, there may be diverse variants of libraries came with pre-build and rpm which might throw some compilation errors. Like one you see here:
/lib64/libldap_r-2.4.so.2: undefined reference to `ber_sockbuf_io_udp'
collect2: ld returned 1 exit status
make: *** [repmgrd] Error 1
To fix, find the checking library in /lib64/libldap_r-2.4.so.2.
[root@localhost repmgr-2.0]# cd /lib64/
[root@localhost lib64]# ls -l libldap*
lrwxrwxrwx. 1 root root     20 Dec  8 09:23 libldap-2.4.so.2 -> libldap-2.4.so.2.5.6
-rwxr-xr-x. 1 root root 317200 Apr 29  2013 libldap-2.4.so.2.5.6
lrwxrwxrwx. 1 root root     22 Dec  8 09:23 libldap_r-2.4.so.2 -> libldap_r-2.4.so.2.5.6
-rwxr-xr-x. 1 root root 335264 Apr 29  2013 libldap_r-2.4.so.2.5.6
Ok, there are two copies, retain one and unlink other.
[root@localhost lib64]# unlink libldap_r-2.4.so.2
Now clean the previous compilation by command "make USE_PGXS=1 clean" and proceed from step 1 and it will compile without any issues. Similarly, you need to fix for other library related errors. After installation you can find binaries, libraries and .SQL files related to repmgr in PostgreSQL locations.
[postgres@localhost:/opt/PostgreSQL/9.3/bin~]$ ls repmgr*     (Two utility commands)
repmgr  repmgrd

[postgres@localhost:/opt/PostgreSQL/9.3/lib/postgresql~]$ ls rep*  
repmgr_funcs.so

[postgres@localhost:/opt/PostgreSQL/9.3/share/postgresql/contrib~]$ ls
repmgr_funcs.sql  repmgr.sql  uninstall_repmgr_funcs.sql  uninstall_repmgr.sql
We are all set to setup automatic failover with a super-simple-toy Replication Manager. As a first step we need to have streaming replication(Refer to wiki) configured which I have done already on my localhost between two port 5432 (Master) and 5433 (Standby) lets use them. You can also try building standby using repmgr STANDBY CLONE command. Refer to repmgr documentation for more details.

Step 1. Enable repmgr libraries on both PostgreSQL instances that required for its backend functions.
Master Data Directory : /opt/PostgreSQL/9.3/data
Standby Data Directory: /opt/PostgreSQL/9.3/data_slave

Edit $PGDATA/postgresql.conf 
shared_preload_libraries = 'repmgr_funcs'

[postgres@localhost:/opt/PostgreSQL/9.3~]$ /opt/PostgreSQL/9.3/bin/pg_ctl -D /opt/PostgreSQL/9.3/data start

[postgres@localhost:/opt/PostgreSQL/9.3~]$ psql -p 5432 -c "show shared_preload_libraries;"
 shared_preload_libraries
--------------------------
 repmgr_funcs
(1 row)
Step 2. Repmgr need repmgr.conf file for each node, since we are working on localhost we need to keep each nodes repmgr.conf in seperate directory.
[postgres@localhost:/opt/PostgreSQL/9.3~]$ pwd
/opt/PostgreSQL/9.3
[postgres@localhost:/opt/PostgreSQL/9.3~]$ mkdir -p repmgr/master repmgr/standby
Step 3. Create repmgr.conf file for Master(5432) and Standby(5433) in the directories we created in Step 2.
[postgres@localhost:/opt/PostgreSQL/9.3/repmgr/master~]$ pwd
/opt/PostgreSQL/9.3/repmgr/master
[postgres@localhost:/opt/PostgreSQL/9.3/repmgr/master~]$ more repmgr.conf
cluster=test
node=1
node_name=master
conninfo='host=127.0.0.1 port=5432 dbname=postgres'
pg_bindir=/opt/PostgreSQL/9.3/bin
master_response_timeout=60 
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='/opt/PostgreSQL/9.3/repmgr/auto_failover.sh'


[postgres@localhost:/opt/PostgreSQL/9.3/repmgr/slave~]$ pwd
/opt/PostgreSQL/9.3/repmgr/slave
[postgres@localhost:/opt/PostgreSQL/9.3/repmgr/slave~]$ more repmgr.conf
cluster=test
node=2
node_name=slave
conninfo='host=127.0.0.1 port=5433 dbname=postgres'
pg_bindir=/opt/PostgreSQL/9.3/bin
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
loglevel=DEBUG
promote_command='/opt/PostgreSQL/9.3/repmgr/auto_failover.sh'
What mainly you have to observe in repmgr.conf is "master_reponse_timeout" which's total wait duration in seconds before declaring master has disappeared. In that duration 6 reconnect attempts made with 10 seconds of interval. After no response from master in "master_response_timeout" duration automatic failover takes place by promote_command script. The script consist of Standby promotion steps, which I have created one for this setup shared below.
[postgres@localhost:/opt/PostgreSQL/9.3~]$ more repmgr/auto_failover.sh
#!/bin/bash
echo "Promoting Standby at `date '+%Y-%m-%d %H:%M:%S'`" >>/tmp/repsetup.log
/opt/PostgreSQL/9.3/bin/pg_ctl -D /opt/PostgreSQL/9.3/data_slave promote >>/tmp/repsetup.log
Step 4. Register Master and Standby node with repmgr using "repmgr" utility.
[postgres@localhost:/opt/PostgreSQL/9.3~]$ repmgr -f repmgr/master/repmgr.conf --verbose master register >/tmp/repsetup.log 2>&1
Master register logs (/tmp/repsetup.log):
[2015-01-12 01:28:55] [INFO] repmgr connecting to master database
[2015-01-12 01:28:55] [INFO] repmgr connected to master, checking its state
[2015-01-12 01:28:55] [INFO] master register: creating database objects inside the repmgr_test schema
[2015-01-12 01:28:55] [DEBUG] master register: CREATE SCHEMA repmgr_test
[2015-01-12 01:28:55] [DEBUG] master register: CREATE TABLE repmgr_test.repl_nodes (          [2015-01-12 01:28:55] [DEBUG] master register: CREATE TABLE repmgr_test.repl_monitor (   [2015-01-12 01:28:55] [DEBUG] master register: CREATE VIEW repmgr_test.repl_status AS  [2015-01-12 01:28:55] [DEBUG] master register: CREATE INDEX idx_repl_status_sort     ON repmgr_test.repl_monitor (last_monitor_time, standby_node)
[2015-01-12 01:28:55] [DEBUG] master register: INSERT INTO repmgr_test.repl_nodes (id, cluster, name, conninfo, priority) VALUES (1, 'test', 'master', 'host=1
27.0.0.1 port=5432 dbname=postgres', 0)
[2015-01-12 01:28:55] [NOTICE] Master node correctly registered for cluster test with id 1 (conninfo: host=127.0.0.1 port=5432 dbname=postgres)
Opening configuration file: repmgr/master/repmgr.conf
Standby:
[postgres@localhost:/opt/PostgreSQL/9.3~]$ repmgr -f repmgr/slave/repmgr.conf --verbose standby register >>/tmp/repsetup.log 2>&1
Standby register logs (/tmp/repsetup.log):
[2015-01-12 01:30:37] [INFO] repmgr connecting to standby database
[2015-01-12 01:30:37] [INFO] repmgr connected to standby, checking its state
[2015-01-12 01:30:37] [DEBUG] standby register: SELECT 1 FROM pg_namespace WHERE nspname = 'repmgr_test'
[2015-01-12 01:30:37] [INFO] repmgr connecting to master database
[2015-01-12 01:30:37] [INFO] finding node list for cluster 'test'
[2015-01-12 01:30:37] [INFO] checking role of cluster node 'host=127.0.0.1 port=5432 dbname=postgres'
[2015-01-12 01:30:37] [INFO] repmgr connected to master, checking its state
[2015-01-12 01:30:37] [INFO] repmgr registering the standby
[2015-01-12 01:30:37] [DEBUG] standby register: INSERT INTO repmgr_test.repl_nodes(id, cluster, name, conninfo, priority) VALUES (2, 'test', 'slave', 'host=12
7.0.0.1 port=5433 dbname=postgres', 0)
[2015-01-12 01:30:37] [INFO] repmgr registering the standby complete
[2015-01-12 01:30:37] [NOTICE] Standby node correctly registered for cluster test with id 2 (conninfo: host=127.0.0.1 port=5433 dbname=postgres)
Opening configuration file: repmgr/slave/repmgr.conf
By looking to the logs you can easily notice, repmgr creating its own schema in database with "repmgr_$CLUSTER" name and some tables/view/functions in it. And it create one row regarding replication lag in repl_monitor table. Repmgr has a utility called "repmgrd" to monitor Master availability running daemon process from Standby node. Its also a management and monitoring system daemon that watches the cluster status and can trigger standby promotion. We need to start "repmgrd" daemon process after registering master/standby nodes with repmgr.

Step 5. Lets start the repmgr daemon process to watch master, in our case we are not running standby on separate node hence we need to start the daemon with repmgr/standby/repmgr.conf file.
repmgrd -f repmgr/slave/repmgr.conf --verbose --monitoring-history >>/tmp/repsetup.log 2>&1 &

Logs (/tmp/repsetup.log)

[2015-01-12 01:42:13] [INFO] repmgrd Connecting to database 'host=127.0.0.1 port=5433 dbname=postgres'
[2015-01-12 01:42:13] [INFO] repmgrd Connected to database, checking its state
[2015-01-12 01:42:13] [INFO] repmgrd Connecting to primary for cluster 'test'
[2015-01-12 01:42:13] [INFO] finding node list for cluster 'test'
[2015-01-12 01:42:13] [INFO] checking role of cluster node 'host=127.0.0.1 port=5432 dbname=postgres'
[2015-01-12 01:42:13] [INFO] repmgrd Checking cluster configuration with schema 'repmgr_test'
[2015-01-12 01:42:13] [INFO] repmgrd Checking node 2 in cluster 'test'
[2015-01-12 01:42:13] [INFO] Reloading configuration file and updating repmgr tables
[2015-01-12 01:42:13] [INFO] repmgrd Starting continuous standby node monitoring
[2015-01-12 01:42:14] [DEBUG] standby_monitor: INSERT INTO repmgr_test.repl_monitor VALUES(1, 2, '2015-01-12 09:42:14.457287+00'::timestamp with time zone,  '2015-01-12 09:42:13.950172+00'::timestamp with time zone, '2/C84DAB08', '2/C84DAB08',  0, 0)
Step 6. Monitor nodes registered with repmgr and running daemon process.
[postgres@localhost:/opt/PostgreSQL/9.3~]$ psql -p 5432 -d postgres -xc "SELECT * FROM repmgr_test.repl_status"
-[ RECORD 1 ]-------------+------------------------------
primary_node              | 1
standby_node              | 2
standby_name              | slave
last_monitor_time         | 2015-01-12 09:42:29.514056+00
last_wal_primary_location | 2/C84DB7A8
last_wal_standby_location | 2/C84DB7A8
replication_lag           | 0 bytes
replication_time_lag      | 00:04:01.960772
apply_lag                 | 0 bytes
communication_time_lag    | 00:03:59.45349

[postgres@localhost:/opt/PostgreSQL/9.3~]$ repmgr -f repmgr/master/repmgr.conf cluster show
[2015-01-12 01:47:00] [INFO] repmgr connecting to database
Role      | Connection String
* master  | host=127.0.0.1 port=5432 dbname=postgres
  standby | host=127.0.0.1 port=5433 dbname=postgres
Step 7. Lets simulate failover scenario by taking master down. I am doing some stupid way by killing postmaster.pid. Please don't do the same on production if you want to stick for long time in the same company :).
[postgres@localhost:/opt/PostgreSQL/9.3~]$ kill `head -n1 data/postmaster.pid`
or, you can take master down safely. 
[postgres@localhost:/opt/PostgreSQL/9.3~]$ pg_ctl -D /opt/PostgreSQL/9.3/data stop -mf
Now check the logs, how repmgr has promted the standby:
[2015-01-12 02:14:11] [WARNING] Can't stop current query: PQcancel() -- connect() failed: Connection refused
[2015-01-12 02:14:11] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 10 seconds before failover decision
...
[2015-01-12 02:14:21] [ERROR] repmgrd: We couldn't reconnect for long enough, exiting...
[2015-01-12 02:14:21] [DEBUG] repmgrd: there are 2 nodes registered
[2015-01-12 02:14:21] [DEBUG] repmgrd: node=1 conninfo="host=127.0.0.1 port=5432 dbname=postgres" witness=false
[2015-01-12 02:14:21] [ERROR] Connection to database failed: could not connect to server: Connection refused
        Is the server running on host "127.0.0.1" and accepting
        TCP/IP connections on port 5432?
[2015-01-12 02:14:21] [DEBUG] repmgrd: node=2 conninfo="host=127.0.0.1 port=5433 dbname=postgres" witness=false
[2015-01-12 02:14:21] [DEBUG] Total nodes counted: registered=2, visible=1
[2015-01-12 02:14:21] [DEBUG] XLog position of node 2: log id=2 (2), offset=3360733048 (C850B778)
[2015-01-12 02:14:21] [DEBUG] Last XLog position of node 2: log id=2 (2), offset=3360733048 (C850B778)
[2015-01-12 02:14:26] [INFO] repmgrd: This node is the best candidate to be the new primary, promoting...
[2015-01-12 02:14:26] [DEBUG] promote command is: "/opt/PostgreSQL/9.3/repmgr/auto_failover.sh"
Promoting Standby at 2015-01-12 02:14:26
server promoting
[2015-01-12 02:14:29] [INFO] repmgrd Checking cluster configuration with schema 'repmgr_test'
[2015-01-12 02:14:29] [INFO] repmgrd Checking node 2 in cluster 'test'
[2015-01-12 02:14:29] [INFO] Reloading configuration file and updating repmgr tables
[2015-01-12 02:14:29] [INFO] repmgrd Starting continuous primary connection check
Perfect, repmgr daemon recognised master failure and before promoting standby by "auto_failover.sh" script it has properly verified the WAL's location as well. Lets verify whether Standby had promotion or not ?
[postgres@localhost:/opt/PostgreSQL/9.3~]$ psql -p 5433 -c "select pg_is_in_recovery();"
 pg_is_in_recovery
-------------------
 f
(1 row)
Incredible, without any human intervention manual failover procedure taken care by repmgr. You can rebuild the dead master as standby either by repmgr itself or you can follow the switchback procedure shown in my previous post. Same way you can setup for two nodes, by executing standby steps on standby node instead of localhost.

Thanks for reading.
Raghav