Oracle High Availability – ASM, Clusterware, Cold Failover

October 11, 2013 by Leave a comment 

In my previous article High Availability and SLA requirements for Oracle database we discussed that the successful High Availability (HA) begins with the understanding of Service Level Agreements (SLA) required by the business along with each of these dimensions. This guides important decisions on IT technology and determines the appropriate level of investment in HA architecture. Choosing the right technical solution for database system design from scratch is difficult. You can follow some best practices in building High Available Oracle database systems based on Availability Levels that match database industry standards. In this article I’ll share some best practices and my experience in architecting database systems for first 2,5 Availability Levels describing some technical solutions based on SAN, cluster, Clusterware, Oracle ASM, Cold Failover Cluster (CFC).

I’ll start with a graph below that illustrates Availability Continuum and depicts the increases in availability that can be gained with the progression between levels of availability. It is not based on empirical data, and the percentage values used are for illustrative purposes only, but being close (based on my experience) to real figures in common IT infrastructure for Oracle database environments though.

Another way to interpret the y-axis scale is as a measure of acceptable down time – the lower end of the axis represents a reasonable amount of down time as tolerable, whereas the upper end of the axis represents even the smallest amount of down time as being intolerable.

Oracle database High Availability Levels

Availability Continuum – Oracle database High Availability Levels

Yet another way to think of the y-axis of the above Availability Continuum graph is in terms of costs. If the demanded availability is higher and if down time becomes accordingly less acceptable, there will be higher costs in achieving the desired level of availability.

For every Availability Level on the picture (sometimes they are called Tiers) I’ll represent concrete technical examples and IT solutions for building Oracle database environments. The levels of availability build on each other, expanding the redundancy and recovery options of the previous level.

Availability Level 0: Out-of-the-box

Level 0 availability is the out-of-the-box configuration of Oracle single database instance with no specific High Availability (HA) elements. Means, you take a single server and install there Oracle single database instance using normal file system without mirroring disks. Even with no action taken other than installing the product, availability is considered to be around 95%. Of course different single servers can have different availability specifications. As an example, you can achieve 95% of availability with this configuration on a single HP ProLiant Servers running Linux OS. And even inside Oracle single database instance you can use many Oracle High Availability (HA) features and tools from the limited list I place below:

– Multiplex production redo logs and control files
– Enable ARCHIVELOG mode and use a flash recovery area
– Log checkpoints to the alert log
– Enable block checking
– Automatic Undo Management
– Locally managed tablespaces
– Automatic Segment Space Management
– Resumable space allocation
– Database resource manager
– Temporary tablespaces with tempfiles
– Oracle Restart
– Flashback Technology including Flashback database
– Recovery Manager (RMAN)
– Fast-Start Fault Recovery
– Data Recovery Advisor
– Online reorganization & redefinition
– Online patching
– Dynamic database reconfiguration
– Automatic Diagnostic Repository (ADR)
– HARD – Oracle Hardware Assisted Resilient Data
– ..

In this configuration along with database build-in features it is very efficient to use Oracle Restart feature that comes as part of Oracle GRID Infrastructure. Oracle restart implements a High Availability (HA) solution for single non-clustered instance environments only. It monitors the health and automatically restarts the following components: Database instances, Oracle Net listener, Database services, ASM instance, ASM disk groups.

Availability Level 1: Storage-Level Protection – SAN / LVM / ASM

Oracle ASM - 2 disk groups for several databases

Oracle ASM – 2 disk groups for several databases

Level 1 availability involves using an or higher database instance with protected storage (SAN, LVM or/and Oracle ASM). This provides an element of protection against a storage-level failure, but there is no redundancy for server components. Here you should consider different options of storage-Level protection. The storage-level protection topic is not as simple as it seems to be at the first glance. You have to consider a lot of things and answer certain questions before you start building a database system. I mention some of them below.

How many file systems, disk groups, disks are enough?

Most of the time, only two disk groups are enough for one single database or even to share the storage array (SAN) between multiple databases. One disk group is meant for Oracle data files (ORA_DATA). Using a second disk group allows you to have a backup of your data by using it as your common Oracle Fast Recovery Area (ORA_FRA). That way you can maximize the number of Logical Unit Numbers (LUNs) used as ASM disks, which gives you the best performance.

Which RAID Configuration for HA to use?

Oracle recommends the use of external redundancy disk groups if possible when using hardware mirroring techniques to avoid an unnecessary overhead on the server. It’s best using LUNs with the same performance, characteristics and capacity maximizing the number of spindles in your SAN and Oracle ASM disk group (this is a pain however for SAN admins). Again the choice depends on your business requirements and budget, but major options I list below :

a) Hardware RAID1 (mirroring; best performance; best choice for modern SAN)

b) RAID5 (parity protection, more economical solution, not for write intensive workloads or redo logs)

c) Oracle ASM mirroring (best choice for low cost storage; enables extended clustering solutions)

d) Both Oracle ASM mirroring and Hardware mirroring is NOT recommended.

What Type of Striping Works Best?

One suggestion here: do not use LVM or “no striping”. Based on Oracle, Stripe-on-stripe (combining both Oracle ASM striping and RAID striping) offers good performance too. However I do not recommend doing that also.

Availability Level 2a: Database in Cold Failover Cluster mode (CFC)

Oracled database in Cold Failover Cluster mode

Oracled database in Cold Failover Cluster mode on Oracle Clusterware

Level 2 availability may consist of a single Oracle database instance in Cold Failover Cluster mode (CFC) at the same physical location or employing Oracle Data Guard to replicate the DB to the failover hardware. Some down time is incurred during the failover to the redundant system. Since, as I mentioned above, the levels of availability are built on each other, this configuration also includes storage-level protection. CFC and Oracle Data Guard are separate worlds and there are many options in terms of configuration there. Starting from Availability level 2 also think about the complexity trying to build rather a simple database system instead of contrive something complex.

Cold Failover Cluster mode (CFC) is more complex topic since it is covered not only by Oracle but by different vendors. For this database configuration you require the following:

– 2 or multiple-nodes cluster
– OS with Clusterware
– Shared storage (SAN) is not mandatory but preferable on Oracle ASM, OCFS2, NFS or any certified cluster file system
– Single database instance in cold failover mode

I personally worked on following configurations below.

Cold Failover Cluster based on OpenVMS and Tru64 with TruCluster Software

+ Very stable
+ Utilize own Cluster File System
– Costly
– Outdated solution

Cold Failover Cluster based on Linux RedHat Clusterware

+ Inexpensive
+ Make sense using Linux RedHat
+ Simple to manage but a major upgrade can be a pain
– Only one Oracle binary set possible for every instance
– Storage array (SAN) is not shared but mounted to active cluster node
– Host-based mirroring in a stretched cluster requires different Linux LVM with more maintenance overhead

Cold Failover Cluster based on HP and ServiceGuard software

+ Complex but stable
+ Available on HP-UX and Linux
+ Host-based mirroring with HP-UX LVM possible
+ Advanced Cluster Split Brain capabilities
+ Host-based mirroring with ASM possible
– Only one Oracle binary set possible for every instance
– Storage array (SAN) is not shared but mounted to active cluster node
– Comparably expansive

Cold Failover Cluster based on Oracle Clusterware with failover scripts

+ Not licensed separately in case you cluster an Oracle licensed product (a database)
+ Works with Oracle Clusterware and Database before 11gR2
+ Storage array (SAN) is shared between cluster nodes
+ Host-based mirroring with ASM possible
– No online database relocation possible as with RAC One Node
– Oracle does not directly support action scripts as custom code fragments

Cold Failover Cluster based on Oracle Clusterware and Oracle RAC 1 Node

+ Oracle recommended Cold Failover Solution
+ Provide higher DB availability than the others
+ Online migration & patching possible
+ Single vendor solution
+ Ready to scale to full RAC
+ Storage array (SAN) is shared between cluster nodes
+ Host-based mirroring with ASM possible
– Works only from 11gR2+
– Licensed separately (~+25% on top CPU license)

As summary, in this article I shared some best practices and my experience in architecting database systems describing some technical solutions based on a single server, SAN, cluster, Clusterware, Oracle ASM and a database in Cold Failover Cluster (CFC) mode. Next time I’ll continue with further availability levels describing Oracle Data Guard, Oracle RAC and Oracle Maximum Availability Architecture (MAA). See some more examples in my Oracle HA presentation at 25 years of DOAG anniversary.

Enjoyed this article? Please share it with others using the social site of your choice:

Add a Comment

We welcome thoughtful and constructive comments from readers.
If you want your own picture to show with your comment?
Go get a Globally Recognized Avatar!

DBMS Blog Updates : Subscribe RSS RSS: Subscribe to Articles · Subscribe to Comments Subscribe RSS Receive site updates via email