Recovering Very Large Databases in the Least Amount of Time

2

Sometimes you come across beautiful things in your life, you wish you had known before, because they were there all the time, waiting for that moment to be discovered – by you.

Recently I discovered one of these things. Shamefully I confess, this brilliant feature of RMAN is around for some time “- since 10g already –  but I have never been lucky enough to explore it, until recently.

I am working for a customer now, who has a number of very large databases, all of them around the size of 12 to 25 Terabytes. These databases are all very heavily used by all kinds of activities; each of them is producing over 150 Gigabytes of redo per day.
Making a full online backup (to disk) of these databases takes more than a week – per database…

Can you imagine what it takes to restore and recover this kind of databases? The customer estimates around 2 weeks. This is not an option, because these databases hold data that is very frequently consulted and processed.

So, what do you do in such a case? Well, RMAN has a nice feature called backup as copy. This makes an image copy of your database and puts it into another location, defaulting to the Flash Recovery Area (assuming you have set up your parameters DB_RECOVERY_FILE_DEST and DB_RECOVERY_FILE_DEST_SIZE). I knew about this feature because I have migrated lots of databases from file system to ASM, for example.

What I didn’t know is that you can use incremental backups to recover this copy of the database. A jewel in the RMAN feature portfolio!

To make a full copy backup of your database:

allocate channel for maintenance type disk;
configure controlfile autobackup on; 
configure default device type to disk; 
 
run { 
BACKUP AS COPY DATABASE TAG ‘FULL_COPY’;
}

The above is not mandatory, the following script will do this for you as well.

allocate channel for maintenance type disk;
configure controlfile autobackup on; 
configure default device type to disk; 
 
run { 
RECOVER COPY OF DATABASE WITH TAG ‘IMG_COPY’ UNTIL TIME ‘SYSDATE – 1’; 
BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG ‘IMG_COPY’ DATABASE PLUS ARCHIVELOG DELETE INPUT;
copy current controlfile to ‘/<whateverlocation>/cp_cntrl.ctl’; 
}

OK, let’s break this down into small pieces:

RECOVER COPY OF DATABASE WITH TAG ‘IMG_COPY’ UNTIL TIME ‘SYSDATE – 1’;

This command will not do anything until there is a backupset with the tag FULL_COPY. When there is a backupset, it will take it and recover the copy of the database using this backupset (see next command).

BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG ‘IMG_COPY’ DATABASE PLUS ARCHIVELOG DELETE INPUT;

This command will create a level 1 backup if there is a level 0 available. During the first run of this script, there is no level 0, so the level 0 will be created (which is basically equal to a backup as copy database).
During the second run of the script, a level 1 incremental backup will be created, which can be used in effect for the recover command as discussed above.

Basically, you are creating an image backup of your database in the Flash Recovery Area. This image backup can be used for restore and media recovery. Since this will be faster than conventional backups, it could be a solution to look at when you demand quick recovery. Well, nothing spectacular, this solution has been discussed a number of times on the Internet already.

In the situation my customer is in, we’re talking different stuff. The database is so large, taking the initial backup takes more than a week, let alone a restore, and we’re not even talking about recovery yet. So we chose a different approach to the ‘recovery’ of the database in case it becomes damaged.

RMAN has this tiny little command that does it all in this case:

RMAN> SWITCH DATABASE TO COPY;

This only works when you have your controlfile available, but hey, that is what we provide for (twice) in the backup script, don’t we? And not for nothing.

copy current controlfile to ‘/<whateverlocation>/cp_cntrl.ctl’;

During the script we autobackup the controlfile and after the backup has been taken, an image copy of the controlfile is made to a separate location, eliminating the need to restore it from autobackup with all of its consequences. If somehow we have lost this image backup of the controlfile as well, we can always fall back to the autobackup.

When we want to recover our lost database in the least amount of time, we recover the copy of the database, assure the availability of the controlfile, then switch the database to the copy, which in effect updates the controlfile with the new location of all the datafiles. After this has been completed, we can open the database (with or without resetlogs, depending on incomplete or complete recovery), and off we go.

The database is available now, but you will be left with a database that doesn’t have a valid backup anymore. Remember we’re using the database copy in the FRA, which in essence was our backup. No panic, redefine the FRA (to e.g. the location of the original database) and start your rman scripts to create a new copy.
Optionally, if required, you can relocate your datafiles according to your needs. BTW, you can use RMAN for this as well:

RMAN> backup as copy datafile <datafile name | datafile number> format '<new location>';
RMAN> switch datafile <datafile name | datafile number> to copy;

The downside of this solution is that you would need twice the amount of diskspace for your database, because you are holding a copy of it on disk. But compare the cost of that against the cost of 2 weeks inavailability of the database. I think in most cases (with current storage costs) the costs will be far less than the benefits.

Share.

About Author

2 Comments

  1. Alexander,
    True, so true, however, what about the replication of corruptions? Obviously, one could consider flashback database for this. Still, my opinion is to have a separate database copy (to be (point-in-time) recovered before being switched to) is a safer method. Additionally, switching to a remote SAN requires additional work by other disciplines (OS/Storage management), whilst switching to a copy is solely done by DBA, which in effect is more efficient.
    Arnoud

  2. Alxeander 'sure' Podkopaev on

    having “twice the amount of diskspace for your database” sound similar to “having two SAN with replication” to me. You switch to backup SAN and voila – you’re running you database again.