How a double entry for LAST_ARCHIVE_TIMESTAMP can stop your audit purge in 12c process stop 3

How a double entry for LAST_ARCHIVE_TIMESTAMP can stop your audit purge in 12c

In the last nine month we were busy migrating our 11g databases to 12c rel. 2. Due to time constrictions we decided early in the project to migrate 1:1. This meant that we did not make use of the CDB/PDB-feature. During the course of the upgrades my colleagues encountered a couple of “special features” or “points of attention”, about which they will report in their own blog articles. Here, I like to highlight a special effect that hit our attention quite some time after the upgrades but should have been remedied during the post-upgrade checks, had we found any hints about it during our preparations.

What had happened?

About three month after the upgrade of a medium sized 11.2.0.4 two-node RAC database with active DataGuard to 12c (12.2.0.1) the addition of two datafiles to the AUDIT_DATA tablespace became necessary. On all our databases the AUDIT_DATA tablespace hosts the AUD$ table of the Standard Audit Trail. Shortly aftwards, the same tablespace had to be enlarged again. That was finally the trigger to research why the audit data was exploding on this very database.

First, a check in view DBA_AUDIT_MGMT_CLEAN_EVENTS showed us that the last purge action registered in this view originated from the night before the upgrade. The Scheduler Job STANDARD_AUDIT_TRAIL_PJ, we were using for the purge, was still running daily and seemingly without faults. But the AUD$-table kept growing, while the data was slowly suffocating the performance of the whole database.

The job history in the Enterprise Manager indicated that all runs of the purge job since the upgrade were executed with a wrong timestamp. That was very odd, because we use a second job called ADVANCE_ARCHIVE_TIMESTAMP to update the timestamp daily around midnight. A quick look into DBA_AUDIT_MGMT_LAST_ARCH_TS came back with this:

AUDIT_TRAIL          RAC_INSTANCE LAST_ARCHIVE_TS                               DATABASE_ID CONTAINER_GUID 
-------------------- ------------ --------------------------------------------- ----------- ---------------------------------
STANDARD AUDIT TRAIL 0 19-AUG-18 02.16.11.000000 PM +00:00 2596755560 00000000000000000000000000000000
STANDARD AUDIT TRAIL 0 07-JAN-19 01.16.12.000000 PM +00:00 2596755560 74680F7FF451428EE05317D5970AE17C
2 rows selected.
SQL>

The first row corresponds to the night before the migration minus a 7-day retention; the second was the current date minus a retention of 7 days. It was possible to change the second row using the DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP procedure the job ADVANCE_ARCHIVE_TIMESTAMP is using as well. Tests showed that exclusively the first row was used by the purge job.

Where did the strange first row come from?

An Oracle 11g database does not use containers and when the database was migrated 1:1 to 12c, the underlying table DAM_LAST_ARCH_TS$ was changed and the CONTAINER_GUID added. Only, there was not container and the ID was set to a default of ‘00000000000000000000000000000000’.

When the job ADVANCE_ARCHIVE_TIMESTAMP comes along for first time after the migration, it sets a new value for the LAST_ARCHIVE_TIMESTAMP but lets the existing one intact. The reason seems to be, that it cannot find the container_id and behaves as if the existing value is for a different audit trail. The new value is added, not replacing the old, which is the default behavior for a combination of audit trail, RAC instance and CONTAINER_ID. The result is a double entry for the same audit trail, of which only the first in line is used. Here, it is the one with the wrong timestamp.

Oracle seems to assume that there is only one entry for each audit trail and the only way to repair this, is to get rid of this wrong entry:

SQL>  delete from DBA_AUDIT_MGMT_LAST_ARCH_TS where CONTAINER_GUID = ‘00000000000000000000000000000000’;

commit;

What did we learn?

Don’t forget to add a check of your Audit parameters to your post-migration checks to 12c. Have a quick look into DAM_LAST_ARCH_TS$ to your post-migration checklist a day or two after the migration to check the purge job is running correctly. It could save a you lot of space and time you need to clean up your audit trail manually if you don’t…