Oracle Cloud – DBaaS instance down for no apparent reason – (how archive log mode and storage shortage forced the instance to its knees)

1

My Oracle Public Cloud DBaaS instance did not respond. On closer investigation, it turned out to be down. Mounting the database ended with an ORA-3113 End of Communication Channel. Investigation of the startup log and the database alerts indicated that no new archive log files could be created. The DBaaS instance is in archive log mode – and has created quite a chunk of archive log files over the last few months – now causing the storage to run out of free space.

I am not much of a DBA. I have spent quite some time on investigating the issue. And my ultimate guide was Marco Gralike who described the very problem I was running into in this blog article: Oracle Cloud – Managing the Database as a Service Cloud. He explained what the issue was and hinted at the steps to resolve it. Of course I need more than a hint, so I had to do some more searching – meeting RMAN as part of my efforts.

I have altered the database – switching of archive log mode – as a temporary measure. I could then startup the database and start using it again. That was a relief!

Next, I used RMAN to remove some of the backup files and some of the archivelog files; this was an uncharted area for me.

Alter DBaaS instance – switch off archive log mode

I have started Putty Agent (Pageant) and loaded my private key of the Oracle Public Cloud key pair into the Pageant session. I then used Putty to connect over SSH to the public IP address for the compute node for my DBaaS instance. I logged in as oracle. No password is required.

I can now start SQLPlus:

sqlplus ‘/ as sysdba’

Just as Marco writes, the database instance is not running. It can be mounted but not opened.

After starting up and mounting:

startup mount

I can check (archive log list) and change the archive log mode (alter database noarchivelog):

image

Note how I first have to turn flashback off. Finally I can open the database and start using it again. It runs now with archivelog and flashback disabled. I can turn those back on once I have recovered some storage space.

Recover Storage by Removing (Old) Backup Sets and Archivelog files

In the end, archivelogs and backup sets are files on the file system. And on my file system they occupied many GBs. However, it is not recommended (!) to just start removing files in order to free up space. Instead, management of these files should be done through RMAN.

In the SSH session described in the previous section, I can start RMAN:

rman target /

and have it connect to the database indicated in my session (oraenv, $ORACLE_HOME, $ORACLE_SID).

image

I have used  the crosscheck command to “synchronize the physical reality of backups and copies with their logical records in the RMAN repository”.

crosscheck backupset;

This takes quite a while. In the end, RMAN reports back:

SNAGHTML1489005

At this point I can start removing files that are no longer required – or have RMAN do so for me. I use this command to free up space currently occupied by any file describing the

DELETE NOPROMPT OBSOLETE RECOVERY WINDOW OF 7 DAYS;

image

This deletes backups and copies that are not needed to recover the database to an arbitrary SCN longer ago than last week. RMAN also deletes archived redo log files that are no longer needed. This command took a really long time to complete ( more than one hour), and deleted an impressive number of files.

image

And subsequently:

image

I probably should do more to set a tighter ted policy for backing up – and purging backups and archive log files. Maybe ask a colleague who knows more about RMAN and Database Backup (& Recovery). It is really not my turf.

Let’s try:

CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 15 DAYS;
image
 

Let’s do the CROSSCHECK:

crosscheck backupset;

and subsequently delete the obsolete backup stuff:

delete obsolete

 

The file system by the way looks happier now – I think.

image

And I think I can now turn on ARCHIVELOG mode and FLASHBACK again, through SQL*Plus as sysdba:

image

 

Resources

Blog article ORA-38774: cannot disable media recovery – when alter database noarchivelog – flashback database is enabled – https://doganay.wordpress.com/2012/03/14/alter-database-noarchivelog-ora-38774-cannot-disable-media-recovery-flashback-database-is-enabled/ 

Blog article: How Do I Start Oracle Service in UNIX or Linux? – http://www.cyberciti.biz/faq/how-do-i-start-oracle-service-in-unix/ 

Blog article by Marco Gralike – Oracle Cloud – Managing the Database as a Service Cloud

Forum: “ORA-03113: end-of-file on communication channel” on startuphttp://dba.stackexchange.com/questions/49682/ora-03113-end-of-file-on-communication-channel-on-startup 

How to Delete Old Obsolete and Expired Oracle RMAN Backup – by Ramesh Natarajan – http://www.thegeekstuff.com/2015/01/delete-oracle-rman-backup/

Oracle Distilled Article: Enable/Disable Archive Log Mode 10g/11g – http://www.oracledistilled.com/oracle-database/backup-and-recovery/enabledisable-archive-log-mode-10g11g/

Documentation on the Delete command in RMAN: http://docs.oracle.com/cd/E11882_01/backup.112/e10643/rcmsynta016.htm#RCMRF121

Documentation on Crosscheck command in RMAN: http://docs.oracle.com/cd/E11882_01/backup.112/e10643/rcmsynta015.htm#RCMRF119

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director and Oracle Developer Champion. Solution architect and developer on diverse areas including SQL, JavaScript, Kubernetes & Docker, Machine Learning, Java, SOA and microservices, events in various shapes and forms and many other things. Author of the Oracle Press book Oracle SOA Suite 12c Handbook. Frequent presenter on user groups and community events and conferences such as JavaOne, Oracle Code, CodeOne, NLJUG JFall and Oracle OpenWorld.

1 Comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.