Recovering disk failures that comprise undo segements
December 21, 2011 Leave a Comment
This is just a snapshot of some sitation that happened quite a while ago on a 10g oracle database. I’m not going to dive into the details that much, just providing some selects to decipher the scenario and show up one of the probable ways of resolving the problem.
Jep, due to a disk failure, as i noticed during the analysis, some statements and things complained about datafiles being offline. An attempt to re-online some datafile resulted in an ORA-00376 error as follows:
SQL> ALTER DATABASE DATAFILE 'H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_06.DBF' ONLINE; ALTER DATABASE DATAFILE 'H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_06.DBF' ONLINE * FEHLER in Zeile 1: ORA-00376: Datei 6 kann zur Zeit nicht gelesen werden ORA-01110: Datendatei 6: 'H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS03.DBF'
Looking up dba_data_files produced this output:
SQL> column file_name format a50
SQL> select file_id, status, online_status, file_name from dba_data_files
where online_status != 'ONLINE' order by file_name;
FILE_ID STATUS ONLINE_ FILE_NAME
------- --------- ------- ---------------------------------------------------------
1 AVAILABLE SYSTEM H:\app_ORACLE\SAN_3\ORADATA\appxyz\SYSTEM01.DBF
5 AVAILABLE RECOVER H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS02.DBF
6 AVAILABLE RECOVER H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS03.DBF
58 AVAILABLE RECOVER H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS04.DBF
65 AVAILABLE RECOVER H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS05.DBF
10 AVAILABLE RECOVER H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_01.DBF
47 AVAILABLE RECOVER H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_02.DBF
50 AVAILABLE RECOVER H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_03.DBF
60 AVAILABLE RECOVER H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_05.DBF
55 AVAILABLE RECOVER H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_04.DBF
86 AVAILABLE RECOVER H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_06.DBF
Further on, there is another file status according to v$datafile:
SQL> column name format a50
SQL> select FILE#, status, name from v$datafile
where status in ('OFFLINE','RECOVER') order by name;
FILE# STATUS NAME
--------- ------- ---------------------------------------------------------
5 OFFLINE H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS02.DBF
6 OFFLINE H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS03.DBF
58 OFFLINE H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS04.DBF
65 OFFLINE H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS05.DBF
10 OFFLINE H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_01.DBF
47 OFFLINE H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_02.DBF
50 OFFLINE H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_03.DBF
55 OFFLINE H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_04.DBF
60 OFFLINE H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_05.DBF
86 OFFLINE H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_06.DBF
Finally then, on the segments layer, one may inspect what segments do actually need recovery bei selecting from dba_rollback_segs:
SQL> select tablespace_name, segment_name, status from dba_rollback_segs where status = 'NEEDS RECOVERY'; TABLESPACE_NAME SEGMENT_NAME STATUS ------------------- ----------------- ---------------- UNDOTBS1 _SYSSMU13$ NEEDS RECOVERY UNDOTBS1 _SYSSMU16$ NEEDS RECOVERY UNDOTBS1 _SYSSMU17$ NEEDS RECOVERY UNDOTBS1 _SYSSMU22$ NEEDS RECOVERY UNDOTBS1 _SYSSMU23$ NEEDS RECOVERY UNDOTBS1 _SYSSMU24$ NEEDS RECOVERY UNDOTBS1 _SYSSMU25$ NEEDS RECOVERY UNDOTBS1 _SYSSMU26$ NEEDS RECOVERY UNDOTBS1 _SYSSMU27$ NEEDS RECOVERY UNDOTBS1 _SYSSMU28$ NEEDS RECOVERY UNDOTBS1 _SYSSMU29$ NEEDS RECOVERY UNDOTBS1 _SYSSMU30$ NEEDS RECOVERY UNDOTBS1 _SYSSMU31$ NEEDS RECOVERY UNDOTBS1 _SYSSMU32$ NEEDS RECOVERY UNDOTBS1 _SYSSMU33$ NEEDS RECOVERY UNDOTBS1 _SYSSMU36$ NEEDS RECOVERY UNDOTBS1 _SYSSMU37$ NEEDS RECOVERY
So far, one notices that just the undo suffered from a disk failre but oracle also offlined other regular datafiles that reference the undo segments in question (see the four stati selected with several statements above).
Ok, this is a situation, that needs a closed database for recovery but using the standard way of closing the database will not work:
SQL> alter database close; alter database close * FEHLER in Zeile 1: ORA-00376: Datei 58 kann zur Zeit nicht gelesen werden ORA-01110: Datendatei 58: 'H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS04.DBF'
However, for the curious of us, doing static selects is still possible:
SQL> select sysdate from dual; -------- 01.05.10
So i at least attempted to execute a shutdown that stopped the job queue and the dispatchers and stuff (as monitored in the alert log):
SQL> shutdown immediate; ORA-00376: Datei 5 kann zur Zeit nicht gelesen werden ORA-01110: Datendatei 5: 'H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS02.DBF'
What follows is a step-by-step protocol of the actions to resove the problem. These steps preassume an oracle on windows, to understand the services and registry steps, and a ready-to-go rman backup, to just do an automatic database recovery.
- Stop the windows services for the database and the listener.
- Set the autostart entry for the database to false (see
HKLM/SOFTWARE/ORACLE/KEY_OraDb10g_home1/ORA_appxyz_AUTOSTART). - Start the windows service for the database.
- Connect to the database as sysdba (
sqlplus "/ as sysdba"). - Just mount the database (
SQL> startup mount;). - Execute a database wide recovery (
SQL> recover database;), devine rman will care for any action by itself. - Open the database (
SQL> alter database open;). - Online the undo files in question:
SQL> ALTER DATABASE DATAFILE 'H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS03.DBF' ONLINE;
- Online the data files in question:
SQL> ALTER DATABASE DATAFILE 'H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_06.DBF' ONLINE;
- Reset the autostart entry for the database to the original value.
- Start the windows service for the listener.
That’s it. Have fun.
Recent Comments