Oracle database High Availability – Extended RAC and MAA
In my previous article I described some best practices in architecting database systems High Availability Levels (HA) 2 and 3 describing some technical solutions based on Data Guard, Standby database and Oracle Real Application Cluster (RAC). This time I’ll continue till Availability level 4 describing cluster database solutions based on Oracle Extended RAC and Oracle Maximum Availability Architecture (MAA) principles with combinations of Oracle RAC and Standby databases. I suggest reviewing a few of my previous articles on database High Availability including the Availability Continuum graph that illustrates and depicts the increases in availability that can be gained with the progression between levels of availability:
High Availability and SLA requirements for Oracle database
Oracle High Availability – ASM, Clusterware, Cold Failover
Oracle database High Availability – Data Guard, Standby, RAC
Availability Level 3b: Recovery via Redundant Components – Extended RAC
Oracle Database with Oracle RAC architecture is designed primarily as a scalability and availability solution that resides in a single data center. It is possible however, under certain circumstances, to build and deploy an Oracle RAC system where the nodes in the cluster are separated by up to 100 kilometers, to share the same RAC database with multiple RAC instances spread across two sites. This architecture is referred to as an Extended RAC or Metro cluster. We can consider this database architecture as an extension of Availability level 3.
The advantages of using Oracle RAC on extended clusters include:
– Ability to fully use all system resources without jeopardizing the overall failover times for instance and node failures
– Extremely rapid recovery if one site fails
– All of the Oracle RAC benefits listed above
An Oracle RAC Extended cluster is an architecture that provides extremely fast recovery from a site failure and allows for all nodes, at all sites, to actively process transactions as part of single database cluster. When two data centers are located relatively close to each other (Campus Cluster), extended clusters can provide great protection for some disasters, but not all. Fire, flooding, and site power failure are just a few examples of limited geographic disasters that can result in the failure of an entire data center.
Extended RAC does not use special software other than the normal RAC installation. However there are strict limitations to Extended Cluster interconnect in terms of network latency, etc. In order to extend a RAC cluster to another site separated from your data center by more than ten kilometers, it is required to use Dense Wavelength Division Multiplexing (DWDM) over Dark Fiber to get good performance results. Dark fiber is a single fiber optic cable or strand mainly sold by telecom providers. DWDM is a technology that uses multiple lasers, and transmits several wavelengths of light simultaneously over a single optical fiber. DWDM enables the existing infrastructure of a single fiber cable to be dramatically increased by supporting more than 150 wavelengths, each carrying up to 10 Gbps. All traffic between the two sites is sent through the DWDM and carried on dark fiber. This includes network, heartbeat traffic and can include mirrored disk writes.
Extended RAC Disk Mirroring
With extended RAC, you can also use disk mirroring to extend the reach of the cluster. Although there is only one RAC database, each data center has its own set of storage that is synchronously mirrored using either a cluster-aware, host-based solution (LVM or Oracle ASM) or an array-based mirroring solution.
With host-based mirroring, shown on the left of the slide, the disks appear as one set, and all I/Os are sent to both sets of the disks. This solution requires closely integrated clusterware and LVM, and ASM is the recommended solution from Oracle.
With array-based mirroring, shown on the right, all I/Os are sent to one site, and are then mirrored to the other. In fact, this solution is like a primary/secondary site setup. If the primary site fails, all access to primary disks is lost. This configuration reduces server mirroring overhead and simplifies Clusterware configuration and maintenance. However an outage may be incurred before you can switch to the secondary site. There are some modern array-based mirroring solutions (HITACHI HAM) that can eliminate the outage.
Achieving Quorum with Extended RAC
With Extended RAC, designing the cluster in a manner that ensures the cluster can achieve quorum after a site failure is a critical issue. As far as voting disks are concerned, a node must be able to access strictly more than half of the voting disks at any time, or that node will be evicted from the cluster. Extended clusters are generally implemented with only two storage systems, one at each site. This means that the site that houses the majority of the voting disks is a potential single point of failure for the entire cluster. To prevent this potential outage, Oracle Clusterware supports a third voting disk on an inexpensive, low-end, standard NFS-mounted device somewhere on the network. It is thus recommended to put this third NFS voting disk on a dedicated server visible from both sites.
Availability Level 4: Active and Passive Recovery – MAA
Level 4 availability is compliant with Oracle Maximum Availability Architecture (MAA) principles and represents Level 3 extended with a passive failover installation located at a physically remote site using Data Guard and/or Oracle Streams/GoldenGate to replicate the RAC database to a failover RAC database. MAA includes best practices for critical infrastructure components including servers, storage, and network and provides the most comprehensive architecture for reducing down time for scheduled and unscheduled outages.
This Availability Level 4 design is a multiple-site solution and consist of the following:
– A Level 3–design primary site containing the RAC database
– A remote secondary identical site with a Physical Standby database (single or RAC) and a Logical Standby (as an alternative a database with Active-Active replication using Oracle GoldenGate)
– Active Data Guard can be used for online reporting (extra licensed)
– Data Guard switchover and failover functions allow the roles to be traded between sites
– Shared storage for database mirrored between and accessible from both sites
– Site failover performed by DNS between load balancers
– Network infrastructure and device redundancy within and between sites
– Power redundancy
Identical site configuration is recommended to ensure that performance is not sacrificed after a failover or switchover. Symmetric sites also enable processes and procedures to be kept the same between sites, making operational tasks easier to maintain and execute. Each site consists of redundant components and routing mechanisms, so that requests are always serviceable even in the event of a failure. Most outages are resolved locally. Client requests are always routed to the site playing the production role.
After a failover or switchover operation occurs due to a serious outage, client requests are routed to another site that assumes the production role. Each site contains a set of application servers or mid-tier servers. The site playing the production role contains a production database using RAC to protect from host and instance failures. The site playing the standby role contains one Physical Standby database, and one Logical Standby database managed by Data Guard. Data Guard switchover and failover functions allow the roles to be traded between sites. Unlike Data Guard using SQL Apply, Oracle Streams enables updates on the replica and provides support for heterogeneous platforms with different database releases. Therefore, Oracle Streams may provide the fastest approach for database upgrades and platform migration.
As a summary with this article I covered the latest High Availability (HA) levels describing database cluster database solutions based on Oracle Extended Real Application Cluster (RAC) and Oracle Maximum Availability Architecture (MAA) principles with combinations of Oracle RAC and Standby databases. What is left – some extras and Oracle 12c High Availability features.