Recovering disk failures that comprise undo segements


This is just a snapshot of some sitation that happened quite a while ago on a 10g oracle database. I’m not going to dive into the details that much, just providing some selects to decipher the scenario and show up one of the probable ways of resolving the problem.

Jep, due to a disk failure, as i noticed during the analysis, some statements and things complained about datafiles being offline. An attempt to re-online some datafile resulted in an ORA-00376 error as follows:

SQL> ALTER DATABASE DATAFILE 'H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_06.DBF' ONLINE;
ALTER DATABASE DATAFILE 'H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_06.DBF' ONLINE
*
FEHLER in Zeile 1:
ORA-00376: Datei 6 kann zur Zeit nicht gelesen werden
ORA-01110: Datendatei 6: 'H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS03.DBF'

Looking up dba_data_files produced this output:

SQL> column file_name format a50
SQL> select file_id, status, online_status, file_name from dba_data_files
  where online_status != 'ONLINE' order by file_name;

FILE_ID  STATUS     ONLINE_  FILE_NAME
-------  ---------  -------  ---------------------------------------------------------
      1  AVAILABLE  SYSTEM   H:\app_ORACLE\SAN_3\ORADATA\appxyz\SYSTEM01.DBF
      5  AVAILABLE  RECOVER  H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS02.DBF
      6  AVAILABLE  RECOVER  H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS03.DBF
     58  AVAILABLE  RECOVER  H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS04.DBF
     65  AVAILABLE  RECOVER  H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS05.DBF
     10  AVAILABLE  RECOVER  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_01.DBF
     47  AVAILABLE  RECOVER  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_02.DBF
     50  AVAILABLE  RECOVER  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_03.DBF
     60  AVAILABLE  RECOVER  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_05.DBF
     55  AVAILABLE  RECOVER  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_04.DBF
     86  AVAILABLE  RECOVER  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_06.DBF

Further on, there is another file status according to v$datafile:

SQL> column name format a50
SQL> select FILE#, status, name from v$datafile
  where status in ('OFFLINE','RECOVER') order by name;

    FILE#  STATUS   NAME
---------  -------  ---------------------------------------------------------
        5  OFFLINE  H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS02.DBF
        6  OFFLINE  H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS03.DBF
       58  OFFLINE  H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS04.DBF
       65  OFFLINE  H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS05.DBF
       10  OFFLINE  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_01.DBF
       47  OFFLINE  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_02.DBF
       50  OFFLINE  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_03.DBF
       55  OFFLINE  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_04.DBF
       60  OFFLINE  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_05.DBF
       86  OFFLINE  H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_06.DBF

Finally then, on the segments layer, one may inspect what segments do actually need recovery bei selecting from dba_rollback_segs:

SQL> select tablespace_name, segment_name, status from dba_rollback_segs
  where status = 'NEEDS RECOVERY';

TABLESPACE_NAME     SEGMENT_NAME      STATUS
------------------- ----------------- ----------------
UNDOTBS1            _SYSSMU13$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU16$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU17$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU22$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU23$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU24$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU25$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU26$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU27$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU28$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU29$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU30$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU31$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU32$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU33$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU36$        NEEDS RECOVERY
UNDOTBS1            _SYSSMU37$        NEEDS RECOVERY

So far, one notices that just the undo suffered from a disk failre but oracle also offlined other regular datafiles that reference the undo segments in question (see the four stati selected with several statements above).

Ok, this is a situation, that needs a closed database for recovery but using the standard way of closing the database will not work:

SQL> alter database close;
alter database close
*
FEHLER in Zeile 1:
ORA-00376: Datei 58 kann zur Zeit nicht gelesen werden
ORA-01110: Datendatei 58: 'H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS04.DBF'

However, for the curious of us, doing static selects is still possible:

SQL> select sysdate from dual;
--------
01.05.10

So i at least attempted to execute a shutdown that stopped the job queue and the dispatchers and stuff (as monitored in the alert log):

SQL> shutdown immediate;
ORA-00376: Datei 5 kann zur Zeit nicht gelesen werden
ORA-01110: Datendatei 5: 'H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS02.DBF'

What follows is a step-by-step protocol of the actions to resove the problem. These steps preassume an oracle on windows, to understand the services and registry steps, and a ready-to-go rman backup, to just do an automatic database recovery.

  1. Stop the windows services for the database and the listener.
  2. Set the autostart entry for the database to false (see HKLM/SOFTWARE/ORACLE/KEY_OraDb10g_home1/ORA_appxyz_AUTOSTART).
  3. Start the windows service for the database.
  4. Connect to the database as sysdba (sqlplus "/ as sysdba").
  5. Just mount the database (SQL> startup mount;).
  6. Execute a database wide recovery (SQL> recover database;), devine rman will care for any action by itself.
  7. Open the database (SQL> alter database open;).
  8. Online the undo files in question:
    	SQL> ALTER DATABASE DATAFILE 'H:\app_ORACLE\SAN_3\ORADATA\appxyz\UNDOTBS03.DBF' ONLINE;
    	
  9. Online the data files in question:
    	SQL> ALTER DATABASE DATAFILE 'H:\app_ORACLE\SAN_4\ORADATA\appxyz\stuff\app_UTL_06.DBF' ONLINE;
    	
  10. Reset the autostart entry for the database to the original value.
  11. Start the windows service for the listener.

That’s it. Have fun.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s