Thursday, March 7, 2013

Fixing failures in Oracle 11g with DRA and RMAN

The Data Recovery Advisor was introduced in Oracle 11g, a feature that gives advice regarding database failures and also repairs them if possible. The DRA relies in the Health Monitor for diagnostics and can be used by means of RMAN or Enterprise Manager. It looks fine but since I don't have a lot of faulty databases I don't have experience with it; however, I have to study it in order to get the OCP 11g certification upgrade. At first look it seems like a good feature for administrators in a hurry or without so much experience, but personally I prefer to do things by hand and by myself.

Anyway, if you want to check the DRA capabilities and you have a database for practicing it might be worth the time trying it. The following is an exercise for doing that; first of all, back up your database because backups are an important part of recovering using the DRA. Next, shut down your database and rename (or delete if you're faithful enough) a datafile, and try to start up your database:

oracle@olimpo:~$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.2.0.1.0 Production on Wed Mar 6 13:35:56 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area  263639040 bytes
Fixed Size      1335892 bytes
Variable Size    213913004 bytes
Database Buffers    41943040 bytes
Redo Buffers      6447104 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/opt/oracle/oradata/orcl11g/example01.dbf'

The database didn't open as expected so the next step is running RMAN to use the DRA. As you might know, it's necessary to have the database at least mounted in order to be able to use RMAN:

oracle@olimpo:~$ rman target /

Recovery Manager: Release 11.2.0.1.0 - Production on Wed Mar 6 13:36:56 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL11G (DBID=915822427, not open)

The next step is listing the current known failures:

RMAN> list failure;

using target database control file instead of recovery catalog
List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
42         HIGH     OPEN      06-MAR-13     One or more non-system datafiles are missing

If you want to know more details about certain failure you can do it as well:

RMAN> list failure 42 detail;

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
42         HIGH     OPEN      06-MAR-13     One or more non-system datafiles are missing
  Impact: See impact for individual child failures
  List of child failures for parent failure ID 42
  Failure ID Priority Status    Time Detected Summary
  ---------- -------- --------- ------------- -------
  145        HIGH     OPEN      06-MAR-13     Datafile 5: '/opt/oracle/oradata/orcl11g/example01.dbf' is missing
    Impact: Some objects in tablespace EXAMPLE might be unavailable

The Health Monitor runs diagnostics automatically when some high or critical failure happens, but if for some reason there is not a diagnostic performed first then you cannot get advice regarding a failure. The next step involves getting advice regarding the open failures:

RMAN> advise failure;

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
42         HIGH     OPEN      06-MAR-13     One or more non-system datafiles are missing
  Impact: See impact for individual child failures
  List of child failures for parent failure ID 42
  Failure ID Priority Status    Time Detected Summary
  ---------- -------- --------- ------------- -------
  145        HIGH     OPEN      06-MAR-13     Datafile 5: '/opt/oracle/oradata/orcl11g/example01.dbf' is missing
    Impact: Some objects in tablespace EXAMPLE might be unavailable

analyzing automatic repair options; this may take some time
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=133 device type=DISK
analyzing automatic repair options complete

Mandatory Manual Actions
========================
no manual actions available

Optional Manual Actions
=======================
1. If file /opt/oracle/oradata/orcl11g/example01.dbf was unintentionally renamed or moved, restore it

Automated Repair Options
========================
Option Repair Description
------ ------------------
1      NOARCHIVELOG mode restore datafile 5  
  Strategy: The repair includes complete media recovery with no data loss
  Repair script: /opt/oracle/diag/rdbms/orcl11g/orcl11g/hm/reco_1400180561.hm

As you might guess, the DRA first checks relevant and mandatory prerequisites before giving any automated repair option, like database backups. The optional manual actions are recommendations about solving the failure by hand, instead of letting the ADR fix the failures automatically. Sometimes you have to carry on mandatory manual actions before being able to do any manual or automated repair, and in this case there is no mandatory manual actions to do.

Moreover, there are two kinds of repair options: with no data loss and with data loss, be sure to check the strategy to know if the selected repair option involves no data loss, or at least to understand what you're going to lose if you select that option in case there is no better option.

Also, if you want you can check and modify the repair script listed before doing any repair, or you can preview the repair procedure without doing any repair at all this way:

RMAN> repair failure preview;

Strategy: The repair includes complete media recovery with no data loss
Repair script: /opt/oracle/diag/rdbms/orcl11g/orcl11g/hm/reco_1400180561.hm

contents of repair script:
   # NOARCHIVELOG mode restore datafile
   restore datafile 5;
   recover datafile 5;

If you're happy with the proposed automatic repair options then you can repair the open database failures this way:

RMAN> repair failure;

Strategy: The repair includes complete media recovery with no data loss
Repair script: /opt/oracle/diag/rdbms/orcl11g/orcl11g/hm/reco_1400180561.hm

contents of repair script:
   # NOARCHIVELOG mode restore datafile
   restore datafile 5;
   recover datafile 5;

Do you really want to execute the above repair (enter YES or NO)? YES
executing repair script

Starting restore at 06-MAR-13
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00005 to /opt/oracle/oradata/orcl11g/example01.dbf
channel ORA_DISK_1: reading from backup piece /opt/oracle/flash_recovery_area/ORCL11G/backupset/2013_03_05/o1_mf_nnndf_TAG20130305T165413_8mdxwq5t_.bkp
channel ORA_DISK_1: piece handle=/opt/oracle/flash_recovery_area/ORCL11G/backupset/2013_03_05/o1_mf_nnndf_TAG20130305T165413_8mdxwq5t_.bkp tag=TAG20130305T165413
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
Finished restore at 06-MAR-13

Starting recover at 06-MAR-13
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 06-MAR-13
repair failure complete

Do you want to open the database (enter YES or NO)? YES
database opened

RMAN> exit


Recovery Manager complete.

You have to confirm that you really want to execute repairs, and as a final step you might want to open the database before leaving RMAN. And that's all! In this case everything went fine and the database was repaired with no data loss:

oracle@olimpo:~$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.2.0.1.0 Production on Wed Mar 6 13:43:10 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> column INSTANCE_NAME format a20
SQL> column STATUS format a20
SQL> column DATABASE_STATUS format a20
SQL> select INSTANCE_NAME, STATUS, DATABASE_STATUS from v$instance;

INSTANCE_NAME      STATUS    DATABASE_STATUS
-------------------- -------------------- --------------------
orcl11g       OPEN    ACTIVE

More information:

Diagnosing and Repairing Failures with Data Recovery Advisor

No comments:

Post a Comment