You consolidated your applications in one database, but now one application needs recovery... Oracle Headquarters Redwood Shores1 e1698667100526

You consolidated your applications in one database, but now one application needs recovery…

Imagine this: You have consolidated you applications into one database (version 10gR2). You take a full backup every Tuesday and Thursday evening and an archivelog backup every few hours. These backups go to an NFS share, and not straight to tape. Everything is running fine until one day a new release of one of the applications messes up its data. Let’s assume that happened on a Wednesday evening.

Application management want’s you to restore the data of app_schema to just before the new misbehaving release went live.

DBA: Say what? Mgmt: Yes, we have a backup don’t we? So please put the data back to just before this release went live. We’ll come back with the exact time we want you to go back to.

What are your options?

1) You can restore and recover the complete database to a previous point in time.

Advantage: Pretty easy to do.

Disadvantage: All the other applications go back in time as well.

Mgmt: No that is not an option!

2) Use flashback database. The end result is as if you had done an point in time recovery. That means that Mgmt doesn’t like this option. Just as option 1). And you need to have set up your database to work with this before hand which wasn’t the case here.

3) Use expdp with flashback_time or flashback_scn.

For example:
expdp / DIRECTORY=DATA_PUMP LOGFILE=expdp_app_schema.log DUMPFILE=expdp_app_schema%u.dmp PARFILE=expdp.par
with expdp.par containing:
flashback_time="TO_TIMESTAMP('2013-09-20 17:04:00','YYYY-MM-DD HH24:MI:SS')"
SCHEMAS=app_schema_one, app_schema_two
PARALLEL=1
FILESIZE=10G

Tweak the parameters to your needs.
If you get any errors about undo…  You are out of luck, and too much time has passed by.

Advantage: Pretty easy to do. No impact for other schema’s.

Disadvantage: You need to have been alerted fairly quickly after the problem or have lot’s of UNDO.

Unfortunately we are notified more then 12 hours later so this is not an option as well.

Thus the flashback table or flashback query commands aren’t an option either.

4) Recover the concerned tablespace to a point in time.

Luckily the application has it’s own tablespace(s). Therefor it is a possibility. No other applications are using the objects of this schema as well. (For the details and more careful planning look at: http://docs.oracle.com/cd/B19306_01/backup.102/b14191/rcmtspit.htm)

Advantage: Fairly well documented by Oracle.

Disadvantage: It’s not something I do everyday. It probably will effect the other applications in this database due to resource consumption (CPU, I/O etc). There are some limitations (see the link above) specially if you do not use a Recovery Catalog. And that is the case here.

5) Duplicate the database to an other server and then use expdp and impdp to move the data to the original production database.

Advantage: If needed we could retry this again. Where as the TSPITR could only be tried once due to the lack of a Recovery Catalog according to the documentation. An other advantage is that we could run this on an other server and not impact the production server.

Disadvantage: We need to find a compatible server with the same or similar OS installed and the same database version. And with enough space.

After careful consideration and collaboration with other colleagues we decide to go with option 5) with one adjustment. We need to skip the tablespaces that we don’t need due to lack of space. (The production database is several hundred’s  Gb.)

Steps taken.

First: I make sure we don’t have a dependency with other tablespaces:

SELECT * 
FROM SYS.TS_PITR_CHECK 
WHERE (
        TS1_NAME IN ('APP_TBLSPC') 
        AND TS2_NAME NOT IN ('APP_TBLSPC')
      )
OR    (
        TS1_NAME NOT IN ('APP_TBLSPC') 
        AND TS2_NAME IN ('APP_TBLSPC')
      );

Second: I want to see that no objects have been created after the time we need to go back to:

SELECT OWNER, NAME, TABLESPACE_NAME, 
       TO_CHAR(CREATION_TIME, 'YYYY-MM-DD HH24:MI:SS') 
       FROM SYS.TS_PITR_OBJECTS_TO_BE_DROPPED 
WHERE TABLESPACE_NAME IN ('APP_TBLSPC') 
AND CREATION_TIME > TO_DATE('2013-09-20 17:15:00','YYYY-MM-DD HH24:MI:SS')
ORDER BY TABLESPACE_NAME, CREATION_TIME;

Third: I want to see if I have the backup files needed to go back to specified time. So I fire up RMAN and connected to the production database as my target and run:

RESTORE TABLESPACE APP_TBLSPC UNTIL TIME "TO_DATE('2013-09-20 17:15:00','YYYY-MM-DD HH24:MI:SS')" preview;
RESTORE TABLESPACE APP_TBLSPC UNTIL TIME "TO_DATE('2013-09-20 17:15:00','YYYY-MM-DD HH24:MI:SS')" preview summary;

This last command is less detailed but for my purpose complete enough.

And of course I need other tablespaces (like system and undo) as well thus I need to run:

RESTORE DATABASE UNTIL TIME "TO_DATE('2013-09-20 17:15:00','YYYY-MM-DD HH24:MI:SS')" preview;
RESTORE DATABASE UNTIL TIME "TO_DATE('2013-09-20 17:15:00','YYYY-MM-DD HH24:MI:SS')" preview summary;

All the needed files are still here. I need to stop the regular backup’s for this database as they delete the obsolete backup files based on a recovery window of 4 days. And I do not want that to happen while I’m recovering.
Four:

Create an init.ora based on the production database. Change all destinations to locations on the server where I am going to create the duplicate. Remove some parameters that I deem unnecessary and downsize sga_max_size and sga_target to something that the duplicate server can handle.

Create an spfile in $ORACLE_HOME/dbs just in case

export ORACLE_SID=DUP
sqlplus / as sysdba
create spfile from pfile='/data/oracle/DUP/DUP.ora';
exit

Create a pwfile for this database:

orapwd file=orapwDUP password=some_password entries=10

Start the DUP database with the nomount option (as there are no files at all yet). Thus RMAN can connect to this AUXILIARY database.

export ORACLE_SID=DUP
sqlplus / as sysdba
startup nomount pfile=/data/oracle/DUP/DUP.ora
exit

Five:

I’m interested in the time details thus I set NLS_DATE_FORMAT like this: (or I will not see the hour and minute details of my actions)

export NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS'

Create an rman.cmd file with the contents:

RUN
{
  ALLOCATE CHANNEL dev1 DEVICE TYPE DISK;
  ALLOCATE CHANNEL dev2 DEVICE TYPE DISK;
  ALLOCATE AUXILIARY CHANNEL aux1 TYPE DISK;
  ALLOCATE AUXILIARY CHANNEL aux2 TYPE DISK;
  SET UNTIL TIME "TO_DATE('2013-09-20 17:15:00','YYYY-MM-DD HH24:MI:SS')";
  SET NEWNAME FOR DATAFILE 1 to '/data/oracle/DUP/system.dbf';
  SET NEWNAME FOR DATAFILE 3 to '/data/oracle/DUP/sysaux01_1.dbf';
  SET NEWNAME FOR DATAFILE 7 to '/data/oracle/DUP/sysaux01_2.dbf';
  SET NEWNAME FOR DATAFILE 12 to '/data/oracle/DUP/sysaux01_3.dbf';
  SET NEWNAME FOR DATAFILE 15 to '/data/oracle/DUP/sysaux01_4.dbf';
  SET NEWNAME FOR DATAFILE 27 to '/data/oracle/DUP/sysaux01_5.dbf';
  SET NEWNAME FOR DATAFILE 28 to '/data/oracle/DUP/sysaux01_6.dbf';
  SET NEWNAME FOR DATAFILE 36 to '/data/oracle/DUP/users01.dbf';
  SET NEWNAME FOR DATAFILE 72 to '/data/oracle/DUP/app_tblspc_1.dbf';
  SET NEWNAME FOR DATAFILE 73 to '/data/oracle/DUP/app_tblspc_2.dbf';
  SET NEWNAME FOR DATAFILE 2 to '/data/oracle/DUP/undo_1.dbf';
  SET NEWNAME FOR DATAFILE 9 to '/data/oracle/DUP/undo_2.dbf';
  SET NEWNAME FOR DATAFILE 11 to '/data/oracle/DUP/undo_3.dbf';
  SET NEWNAME FOR DATAFILE 13 to '/data/oracle/DUP/undo_4.dbf';
  SET NEWNAME FOR DATAFILE 18 to '/data/oracle/DUP/undo_5.dbf';
  SET NEWNAME FOR DATAFILE 19 to '/data/oracle/DUP/undo_6.dbf';
  SET NEWNAME FOR TEMPFILE 1 TO '/data/oracle/DUP/temp.dbf';
DUPLICATE
  TARGET DATABASE TO DUP
SKIP TABLESPACE
OTHER_APP_TBSPC1,
OTHER_APP_TBSPC2,
OTHER_APP_TBSPC3,
OTHER_APP_TBSPC4,
OTHER_APP_TBSPC5,
OTHER_APP_TBSPC6,
OTHER_APP_TBSPC7,
OTHER_APP_TBSPC8,
OTHER_APP_TBSPC9,
OTHER_APP_TBSPC10,
OTHER_APP_TBSPC11,
OTHER_APP_TBSPC12,
OTHER_APP_TBSPC13,
OTHER_APP_TBSPC14,
OTHER_APP_TBSPC15,
OTHER_APP_TBSPC16,
OTHER_APP_TBSPC17,
OTHER_APP_TBSPC18,
OTHER_APP_TBSPC19,
OTHER_APP_TBSPC20,
OTHER_APP_TBSPC21,
OTHER_APP_TBSPC22,
OTHER_APP_TBSPC23,
OTHER_APP_TBSPC24,
OTHER_APP_TBSPC25,
OTHER_APP_TBSPC26
  PFILE='/data/oracle/DUP/DUP.ora'
    LOGFILE
      '/data/oracle/DUP/redo01.log' SIZE 500M REUSE,
      '/data/oracle/DUP/redo02.log' SIZE 500M REUSE,
      '/data/oracle/DUP/redo03.log' SIZE 500M REUSE;
}

And then very important: Make sure that if my session to the duplicate database server gets aborted, my RMAN session can continue by using nohup like this: (all on one line)

nohup rman target sys/syspw@production_server_name:1521/PROD AUXILIARY / cmdfile=/data/oracle/DUP/rman.cmd log=/data/oracle/DUP/rman.log > nohup.log 2>&1 &

The above command works if you start it on the server where you want to create the duplicate database (as the os user that owns the Oracle software).

The documentation mentions that you should use “AUXILIARY sys/syspw@duplicate_server_name:1521/DUP” or actually “AUXILIARY sys/syspw@DUP” but when I test a sqlplus connection I get the message that the listener is blocking all connection to my DUP database. Connecting sqlplus / as sysdba works. As I have no time to figure out why the listener is blocking me I continue with the statement above. Also I don’t need to edit the tnsnames.ora by using the connection method mentioned above.

With a tail -f alert_DUP.log and a tail -f on rman.log I can monitor what’s going on. The alert file is actually more informative then I’d expected.

Depending on the hardware (CPU’s, disks, disk speed etc) and the size of the database I need to duplicate, the duplicate database will be there after minutes/hours or days…

I use two channels to the target and auxiliary databases each because I have two cores that I can fully utilize.

On an other system I could duplicate ~370Gb within 2 hours, but this system takes more then 4 hours for about ~280Gb of backup files where less then 11Gb is created in the DUP database.

Six: The last thing to do is to use expdp to backup the remaining data in the production database for the schema involved, export it from the DUP database as well, clean out the schema in production (checking that no other objects became invalid), and import the dumpfile from the duplicate with impdp into the production database.

DBA: Ready with the restore! The application can be tested.

Mgmt: Everything is working again as it is supposed to! Thanks for a job well done!

2 Comments

  1. Harry Dragstra September 24, 2013
  2. Harry Dragstra September 24, 2013