My customer needed to migrate to a new set of datacenters, and while other services were able to “lift and shift”, the Oracle RAC database servers had to be rebuilt in the new datacenter because the underlying platform wasn’t able to migrate the shared disks. Fortunately the environment was already as HA as it could be, by using Dataguard, so migration was nothing more than doing switchovers and duplicating databases.
As the networks in both old and new datacenters were interconnected, same subnet, same broadcast domain, the newly installed machines could use the same name and IP adresses as their old counterparts. It requires of course that the old machines are turned off before starting the installation. We’ve choosen this option to prevent too much configuration work at the application side. The application is able to connect to both RAC clusters, and has a sane connect timeout setting (!) in its connection details, so the absense of a cluster for a while should not give a problem.
So the basic method for migration is:
-
- Do a dataguard switchover if required
- Disable fast-start failover and remove the standby database from dataguard configuration
- If not done already, save relevant listener.ora and tnsnames.ora files
- Stop database and cluster(ware) on the standby database cluster and shutdown the machines
- Create/provision/have someone build your new infrastructure in the new datacenter
- Install and configure the cluster software, and install the database software
- Don’t forget to patch (<—See here, a little pitfall)
- Configure listener, and database tnsnames.ora
- copy password file, spfile, duplicate database
- add database to dataguard configuration again and apply settings to re-enable fast-start failover
- repeat above steps for the remaining cluster… (and repeat for the other environments)
These steps are a bit global, but give a general idea how the migration works. One thing not shown in the list above is the fact there are more databases on each cluster, and most of the time one cluster will contain both a primary database and a standby database. during the migration there will be a time both primary databases are on the same cluster.
When doing a lot of duplications and Dataguard configurations there are some common pitfalls you find. One thing I noticed was an error where dataguard couldn’t start applying the redo stream, while the passwordfile was correct and passwords were correct. With a password file there are two possible issues:
- The password file should be on both primary and standby on ASM OR a local file. It can not be mixed
- The password file format version is not the same
The last issue will report in the alert log from the apply instance on the standby database server:
ORA-46952: standby database format mismatch for password file '+DBNAME_DATA/DBNAME/PASSWORD/pwddbname.ora'
I found this issue when i copied the password file to asm by using orapwd, it silently upgraded the password file to the 12.2 format version. To find out the version of your password file (and the location!) use the following query:
SQL> select file_name, format from v$passwordfile_info; FILE_NAME FORMAT ---------------------------- ------ +DB_DATA/DB/PASSWORD/pwddb 12.2 -- result on the primary database FILE_NAME FORMAT ------------------------------- ------ +DBP_DATA/DBP/PASSWORD/pwddbp 12
To resolve this issue, copy the password file again to ASM but add the argument “format=12” to orapwd, now the redo apply should be able to start.
Another issue can arise where order is important: Before you can remove a database from a Dataguard configuration it should not be a part of a fast-start failover configuration. If you do not, the following message will DGMGRL tell you:
“Error: ORA-16867: cannot remove a database that is specified in a FastStartFailoverTarget configuration property”
So to prevent above message from appearing, disable fast-start failover first, remove the target configuration and then the database can be removed from the configuration.
disable fast_start failover edit configuration set protection mode as maxperformance; edit database <current_Primary> set property FastStartFailoverTarget=''; remove database <current_standby>
One more thing about ordering: Create the database cluster resource before duplication, and create the relevant paths in ASM for the database so you can place the passwordfile there and have clusterware be aware of that fact. When you use orapwd to place a password file on ASM, orapwd will automagically update the database cluster resource in clusterware so the passwordfile can be found on ASM. In addition, the SPFILE will also be registered in the same resource, so the duplication will take care of the SPFILE as well.
A final note, and I’ve mentioned this earlier: If there are issues with dataguard and the alert log for your databases are not telling you why, change the log_archive_trace level using DGMGRL, see the Oracle documentation for the usage.
In conclusion I would like to thank Edwin for managing this part of the migration project and dealing with the project management tasks to get things done.