When checking the status of the backups of the databases on a baremetal ODA (X8-2-HA) we noticed an error in the RMAN logfiles:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of sql statement command at 09/03/2024 20:40:03
ORA-01565: error in identifying file '+FLASH/CDB1P12/PARAMETERFILE/spfile.1196.1121944743'
ORA-17503: ksfdopn:10 Failed to open file +FLASH/CDB1P12/PARAMETERFILE/spfile.1196.1121944743
ORA-01034: ORACLE not available
ORA-27121: unable to determine size of shared memory segment
Linux-x86_64 Error: 13: Permission denied
Additional information: 8703
Additional information: 65538
ORA-15064: communication failure with ASM instance
ORA-59518: Client instance CDB1P12:CDB1P12:ODA-1-c is already connected to this AS
RMAN> exit;
This ODA was recently patched to version 19.19 and there is a known issue that in some cases the SUID on some executables of the grid infrastructe is not properly set, see also the MOS Doc 1487382.1.After solving this issue we succesfully made backups of all the databases on that ODA. But then there was another problem.
One of the PDB’s has a refreshable PDB on a second ODA for backup purposes. The PDB is being refreshed every two minutes. However after the earlier issues with ASM the refresh no longer worked and there was a repetitive error in the alertlog of the source PDB:
...
2024-09-04T13:18:48.162001+02:00
PDBP01(3):Errors in file /u01/app/odaorabase/oracle/diag/rdbms/cdb1p12/CDB1P12/trace/CDB1P12_ora_61282.trc:
ORA-00308: cannot open archived log '+FLASH/cdb1p12/arc10/parlog_1_1247_2ad719eb_1121944339.arc'
ORA-27009: cannot write to file opened for read
...
This error occured every two minutes and also created a lot of tracefiles which lead to space problems om the filesystem.
In the alertlog of the refreshable PDB there were also repetitive errors:
...
2024-09-04T13:18:45.453931+02:00
alter pluggable database pdbp01d refresh
2024-09-04T13:18:48.167238+02:00
ORA-17628 signalled during: alter pluggable database pdbp01d refresh...
...
We tried ta manully refresh the PDB in SQL*Plus but that threw another error:
sql> alter pluggable database pdbp01d;
ERROR at line 1:
ORA-17628: Oracle error 308 returned by remote Oracle server
ORA-00308: cannot open archived log ''
So, there is an archive with no name missing and there is not really much information on the ORA-00308 error. Next step was to drop the refreshable PDB and create a new one, that would be much faster than investigating the root cause. Wrong… the ‘create pluggable database… refresh mode’ immediately threw an error:
ERROR at line 1:
ORA-17628: Oracle error 308 returned by remote Oracle server
ORA-00308: cannot open archived log ”
Not what we hoped for so the search for the root cause continued. After quite a lot of research, trial and error the solution was found. The cause were some settings in the log_archive_dest parameters. We found out that the values were:
log_archive_dest_1 string LOCATION=USE_DB_RECOVERY_FILE_
DEST VALID_FOR=(ALL_LOGFILES,A
LL_ROLES) MAX_FAILURE=1 REOPEN
=5 DB_UNIQUE_NAME=CDB1P12 ALTE
RNATE=log_archive_dest_10
log_archive_dest_10 string LOCATION=+FLASH/CDB1P12/arc10
VALID_FOR=(ALL_LOGFILES,ALL_RO
LES) DB_UNIQUE_NAME=CDB1P12 AL
TERNATE=log_archive_dest_1
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_1 string ENABLE
log_archive_dest_state_10 string ALTERNATE
Also in ASM we saw that the archives were written to the alternate FLASH datagroup and not to the RECO as we expected:
asmcmd> ls -l
ARCHIVELOG MIRROR COARSE SEP 04 13:00:00 N 1_1265_1121944339.dbf => +FLASH/CDB1P12/ARCHIVELOG/2024_09_04/thread_1_seq_1265.1998.1178804987
ARCHIVELOG MIRROR COARSE SEP 04 13:00:00 N 1_1266_1121944339.dbf => +FLASH/CDB1P12/ARCHIVELOG/2024_09_04/thread_1_seq_1266.1999.1178804987
ARCHIVELOG MIRROR COARSE SEP 04 13:00:00 N parlog_1_1247_2ad719eb_1121944339.arc => +FLASH/CDB1P12/partial_archivelog/2024_09_04/thread_1_seq_1247.1145.1178804039
ARCHIVELOG MIRROR COARSE SEP 04 13:00:00 N parlog_1_1252_2ad719eb_1121944339.arc => +FLASH/CDB1P12/partial_archivelog/2024_09_04/thread_1_seq_1252.838.1178804221
ARCHIVELOG MIRROR COARSE SEP 04 13:00:00 N parlog_1_1253_2ad719eb_1121944339.arc => +FLASH/CDB1P12/partial_archivelog/2024_09_04/thread_1_seq_1253.1135.1178804341
ARCHIVELOG MIRROR COARSE SEP 04 13:00:00 N parlog_1_1256_2ad719eb_1121944339.arc => +FLASH/CDB1P12/partial_archivelog/2024_09_04/thread_1_seq_1256.1987.1178804521
ARCHIVELOG MIRROR COARSE SEP 04 13:00:00 N parlog_1_1260_2ad719eb_1121944339.arc => +FLASH/CDB1P12/partial_archivelog/2024_09_04/thread_1_seq_1260.1992.1178804759
ARCHIVELOG MIRROR COARSE SEP 04 13:00:00 N parlog_1_1267_2ad719eb_1121944339.arc => +FLASH/CDB1P12/partial_archivelog/2024_09_04/thread_1_seq_1267.2000.1178804999
The main problem here is that the refreshable pluggable creates a link to the latest available archivelog. In above sample output, for example parlog_1_1267_[something].arc which is a link to the archivelog with sequence 1267. Only problem is, that the archivelog file with sequence 1267 is not yet available in ASM. That’s what caused the ORA-00308: cannot open archived log ” error.
Whenever we force a logswitch in the source database, the automatic refresh would succeed. Apparently, the refreshable PDB has an issue with the recovery file destination pointing to FLASH storage in ASM.
The solution was to reset the parameters:
Reset log_archive_dest_10:
SQL> alter system reset log_archive_dest_10;
Set log_archive_dest_1 without the alternate option:
SQL> alter system set log_archive_dest_1="LOCATION=USE_DB_RECOVERY_FILE_DEST VALID_FOR=(ALL_LOGFILES,ALL_ROLES) MAX_FAILURE=1 REOPEN=5 DB_UNIQUE_NAME=CDB1P12";
Re-enable log_archive_dest_1:
SQL> alter system set log_archive_dest_state_1='ENABLE';
From that moment on the media recovery on the refreshable PDB worked again. Evidently the archives were written to the alternate destination because of the earlier problems with ASM but the archive process did not switch it back once that was solved.
Thanks to my colleague Mark Koreman for the troubleshooting.