Wednesday, October 19, 2011

High Availability Clustering with PostgreSQL


Firstly, I should thank my company for giving me an opportunity to work mostly with PostgreSQL HA stuff. I have worked with very good clients who has implemented Clustering with PostgreSQL. So, my article here is to give little idea on how HA clustering will work with PostgreSQL.

PostgreSQL has built-in functionality for High Availability like Warm Standby,Hot Standby and Streaming Replication. But, missing few features like Switchover/Switchback, failover automation, minimal downtime etc., which are mostly demanded by the companies. Postgres community member's are working on the demands aggressively and hope we see very new PostgreSQL soon with all features bundled. For now, let see Clustering with PostgreSQL.


There are many clustering architecture diagram's in brief which I have shared links below, but what I made here is just an overview of it. 

What is High Availablity clustering ? 

High availability clustering (HAC)is a feature which provides redundancy and fault tolerance. Its a number of connected devices processing and providing a service. HAC, involves employing both hardware and software technologies, like Server redundancy(including application failover and server clustering), Storage redundancy (including RAID and I/O multipathing), Network redundancy and Power system redundancy.

It's goal is to ensure this service is always available even in the event of a failure. If one server fail's the other servers will continue processing and take on the processing load of the failed server. HA cluster implementation attempt to use redundancy of cluster components to eliminate single points of failure.

Currently Available HA Products

There are many competitive high-availability software products in the market today; deciding which one to purchase can be tough. Following are the list of features you need to look in any HAC product.
  • Clustering capability ( How many servers can be clustered together?)
  • Load-balancing capability 
  • Intelligent monitoring 
  • Centralized management capability
  • Application monitoring
  • Cost (Most importantly though :) )
  • Customer support (Most of the products do this)
I have seen two of them, one is RedHat Cluster Suite (which is commonly used HA package for Linux operating system) and another is Steeleye-LifeKeeper.


Who needs ?

High Availability Clusters are often used by websites serving 24x7x365 not affording any downtime Eg: Amazon.com,Music websites, Customer Service sites etc., or Companies with  Critical Databases.

How it works ?

You need minimum two nodes to start with HA. HA clusters usually use a heartbeat private network connection which is used to monitor the health and status of each node in the cluster. In any serious condition, any of the cluster goes down then other node attempts to start services and provides the same service. 

Types of HAC :
  • Active/Passive: In this mode, one node is active (i.e., Primary) and processing service, while other node will be in passive mode meaning its a standby and will only become active if the primary node fails.
  • Active/Active : In this mode, both nodes are active and traffic is load balanced between both nodes and processing service. If one node fails, the other node will take the full processing load, until the failed node becomes active again.
Note: Active/Active mode is not supported with PostgreSQL.


Heartbeat:

A heartbeat is a sensing mechanism which sends a signal across to the primary node, and if the primary node stops responding to the heartbeat for a predefined amount of time, then a failover occurs automatically.

Failover Automation:

Automatic failover is the process of moving active services from the primary node to the standby node when the primary node fails. Usually the standby node continues its services until the primary node has come back up and running. When a device fails another device takes over this process which is referred to as a failover. 

Failover automation is usually implemented on hardware firewalls over networks. You need to configure firewalls on Primary to take over Standby node in case of primary firewall fails. 

HAC support with PostgreSQL

Currently, RHCS or LifeKeeper supports Active/Passive clustering with PostgreSQL. There is no Active/Active support for PostgreSQL yet. As I said, PostgreSQL has no built-in functionality of Failover Automation including third party replication tools like Slony-I, Londiste, etc.. To achieve this you may need to trick with OS level Scripting or take the help of Clustering. 

Below link will help you to understand more about PostgreSQL Clustering with RHCS by Devrim Gunduz(Postgres Community Member).


Setup service from EnterpriseDB on RHCS:


Do post your comments..

--Raghav




6 comments :

Anonymous said...

Hi,

Good explanation about the limitations of Postgres HA. Please share the procedure, as how to configure Postgres HA using RHCS, including the component versions. These details are not available even in the Devrin Gunduz pdf.

Thanks

Unknown said...

Hi Raghav,
How many standby server for one primary server ?
Thanks.

Unknown said...

I found my answer: http://raghavt.blogspot.com/2011/10/replication-in-postgresql-90.html

Thanks.

Anonymous said...

Please share the procedure, as how to configure and implement Postgres HA using RHCS

Anonymous said...

I don't think there's anything that anyone can share unfortunately...just countless words all over and the closest I got (without paying) was https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Suite_Overview/s1-service-management-overview-CSO.html

Other than that you could use EnterpriseDB l"Setup service from EnterpriseDB on RHCS" from link above http://www.enterprisedb.com/services/packaged-services/high-availability where they say "Our Knowledge Is Your Gain" but...you pay for it - nothing is for free other than what you do on your own and even that is at your own expense.

Unknown said...

HI,

Is there a way to automate pg_basebackup process when the main server recovers from a failover so that the latest entries are copied from standby (now main) server.

Regards

Post a Comment

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License