RAC and single instance RMAN backup is similar, but…
Backup of a 11gR2 RAC database is similar to that of a single instance 11gR2 database.
There is an important difference though… assuming that you – like me – schedule a non-RAC backup by cron on the database node, using local RMAN scripting. If the node you scheduled the backup on, is down, your backup simply cannot run!
So it’s obviously wrong to use the same processing method on a RAC cluster node that at the start of the backup might be offline, purposely or not. To make sure my RMAN backup will still be running when all but one node is offline, I use DBMS_SCHEDULER in combination with RMAN scripting, placed in a directory that’s shared by all nodes.
Don’t change your RMAN backup script, but…
Not in scope now is the RMAN scripting itself… as long as you don’t use explicit connections or parallelism, and do use the FRA on ASM for all your recovery files, your code can remain the same, with one notable exception….
As of 11gR2 RAC, Oracle wants your snapshot controlfile to be on a shared location [ID 1472171.1]… I use the FRA on ASM, but an ACFS mounted directory will also do.
If you are familiar with 11gR2 Grid Infrastructure and RAC, creating a shared ACFS mounted directory should pose no problems. Just use ASMCA to create an ACFS volume on one of your ASM disk groups, and then the shared file system on some pre-created OS directory…. I use /shared.
Use DBMS_SCHEDULER for scheduling…
11gR2 DBMS_SCHEDULER is RAC aware, and DBMS_JOB is not. That’s why you should use DBMS_SCHEDULER instead of DBMS_JOB to schedule the backup of a RAC database. Before defining the job:
The local time zone of all RAC nodes must be identical, and preferably the same as the DBMS_SCHEDULER default time zone… check the OS time zones and the default time zone of DBMS_SCHEDULER.
1) On all RAC nodes, check the local time zone ( /etc/localtime in OL 6.3 ), and set identical:
[root etc]# ln -s /usr/share/zoneinfo/Europe/Amsterdam localtime
2) Check date and time:
[root etc]# date
3) Configure the default timezone in DBMS_SCHEDULER with “sqlplus / as sysdba”:
SQL> exec dbms_scheduler.set_scheduler_attribute ('DEFAULT_TIMEZONE', 'Europe/Amsterdam');
4) Check the default timezone value:
SQL> select value from dba_scheduler_global_attribute where attribute_name = 'DEFAULT_TIMEZONE';
DBMS_SCHEDULER is able to process OS shell scripts, but needs a “credential” for OS login… configure this user first, later to be used by the backup job.
1) Create an OS credential for the backup job with “sqlplus / as sysdba”:
SQL> exec dbms_scheduler.create_credential( credential_name => 'oracle', username => 'oracle', password => '****');
DBMS_SCHEDULER is quite impressive, and delivers a lot of possible options for the definition of a job. I stick to the simple ones, and accept as many default values as possible.
1) Create the backup job with “sqlplus / as sysdba”, but do not enable or schedule it yet.
If you wonder over the second argument… this is the one parameter of my RMAN script, that controls the number of days the database-copy lags the original, and the selection of archivelogs to be deleted at the end of the backup.
SQL> begin dbms_scheduler.create_job( job_name => 'RMAN_BACKUP', job_type => 'EXECUTABLE', job_action => '/bin/sh', number_of_arguments => 2, start_date => SYSTIMESTAMP, credential_name => 'oracle', auto_drop => FALSE, enabled => FALSE); dbms_scheduler.set_job_argument_value( job_name => 'RMAN_BACKUP', argument_position => 1, argument_value => '/shared/scripts/dbbackup.sh'); dbms_scheduler.set_job_argument_value ( job_name => 'RMAN_BACKUP', argument_position => 2, argument_value => '4'); end; / commit /
Test the backup job…
Now, first of all, test the backup job. When you encounter problems, check if your code is using full file and directory names – mandatory with DBMS_SCHEDULER – , and if your script has the executable bit enabled.
SQL> begin dbms_scheduler.purge_log(job_name=>'RMAN_BACKUP'); dbms_scheduler.run_job('RMAN_BACKUP'); end; /
Schedule the backup job…
If the test is successful, schedule the backup job, in this case every evening at 22:05:
SQL> begin dbms_scheduler.set_attribute( name => 'RMAN_BACKUP', attribute => 'repeat_interval', value => 'freq=daily; byhour=22; byminute=05; bysecond=0;'); dbms_scheduler.enable( 'RMAN_BACKUP' ); end; /
Job starts on any one instance that’s up and running, but…
Take in mind that this backup configuration is “sticky”… meaning that DBMS_SCHEDULER always tries to start the backup on the node where the last backup was successfully executed. And although this configuration works well when the instance, used by RMAN, keeps online during the backup, a failover of RMAN backup sessions at sudden node or instance collapse to remaining up-and-running instances, is untested…. see the next 2 paragraphs.
Use a RAC service to limit the job’s RMAN database connections…
Not strictly necessary, but I use an extra RAC service ( CLUSTER_BACKUP ), running on all nodes, for RMAN to connect by. The standard “rman target /” in my standard backup script then changes to “rman target sys/*****@RAC_BACKUP_NETSERVICE”, where “RAC_BACKUP_NETSERVICE” is the TNSNAMES.ORA net service name, on top of host <LOCAL_NODE_NAME> and service_name CLUSTER_BACKUP. A TNSNAMES.ORA with the above settings is configured on each and every node.This is to make certain RMAN doesn’t spread its database connections over more than 1 instance, once the backup job has started.
RMAN backup should failover, but…
I’m not really sure if this is still necessary in 11gR2, and by adjusting the TNSNAMES.ORA entry – from host <LOCAL_NODE_NAME> to host <CLUSTER_SCAN_NAME> – you may in theory achieve some cluster failover capabilities. After all, why should RMAN connections be so very different from standard client connections, which all – if setup correctly! – use the <CLUSTER_SCAN_NAME> to connect..? Anyone who tested a failover with this kind of setting during backup and sudden instance shutdown?
Let me know the results!