Monday, May 20, 2013

Network Adapter Could not Establish Connection

This is a funny scenario: you have a java program that connects to an Oracle database using IP adresses, but cannot connect to the database even if you use the same exact connect string in tnsnames and you're able to connect to that database using the SQL/Plus client.

But let's start from the beginning, when you don't have any issues at all (because the Oracle database server name has a full-qualified domain name and is resolvable):

oracle@test:~$ cd $ORACLE_HOME/network/admin
oracle@test:~/app/oracle/product/10.2.0/server/network/admin$ cat tnsnames.ora 
# tnsnames.ora Network Configuration File:

XE =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = hera.localdomain)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = XE)
    )
  )

XE2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.108.76)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = SHARED)
      (SERVICE_NAME = XE)
    )
  )

XE3 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.108.76)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = XE)
    )
  )

As you can see, there are three different connect strings defined in this tnsnames file: one with a FQDN as dedicated connection, one with an IP address as shared connection, and the last one with an IP address as dedicated connection. Now you test the name of the server and try to connect using these three services:

oracle@test:~$ ping hera
PING hera.localdomain (192.168.108.76) 56(84) bytes of data.
64 bytes from hera.localdomain (192.168.108.76): icmp_req=1 ttl=64 time=8.34 ms
64 bytes from hera.localdomain (192.168.108.76): icmp_req=2 ttl=64 time=3.58 ms

--- hera.localdomain ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1003ms
rtt min/avg/max/mdev = 3.589/5.964/8.340/2.376 ms

oracle@test:~$ sqlplus hr/hr@xe

SQL*Plus: Release 10.2.0.1.0 - Production on Wed May 15 15:44:16 2013

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

SQL> exit
Disconnected from Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

oracle@test:~$ sqlplus hr/hr@xe2

SQL*Plus: Release 10.2.0.1.0 - Production on Wed May 15 15:44:21 2013

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

SQL> exit
Disconnected from Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

oracle@test:~$ sqlplus hr/hr@xe3

SQL*Plus: Release 10.2.0.1.0 - Production on Wed May 15 15:44:26 2013

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

SQL> exit
Disconnected from Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

Everything was fine as expected, and even this little handy java test program is working:

oracle@test:~$ cat OracleJdbcExample.java |grep jdbc
        String url = "jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.108.76)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=XE)))";

oracle@test:~$ javac -g OracleJdbcExample.java 
oracle@test:~$ export CLASSPATH=$CLASSPATH:/usr/lib/oracle/MyJDBC/ojdbc6.jar
oracle@test:~$ java OracleJdbcExample
Current Date from Oracle : 2013-05-15 16:06:08
done

But what happens when the host name is not resolvable anymore? Like when commenting the nameserver lines in /etc/resolv.conf:

oracle@test:/etc# ping hera
ping: unknown host hera

oracle@test:~$ sqlplus hr/hr@xe

SQL*Plus: Release 10.2.0.1.0 - Production on Wed May 15 16:02:25 2013

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

ERROR:
ORA-12560: TNS:protocol adapter error


Enter user-name: 

oracle@test:~$ sqlplus hr/hr@xe2

SQL*Plus: Release 10.2.0.1.0 - Production on Wed May 15 16:02:44 2013

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

SQL> exit
Disconnected from Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

oracle@test:~$ sqlplus hr/hr@xe3

SQL*Plus: Release 10.2.0.1.0 - Production on Wed May 15 16:02:50 2013

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

SQL> exit
Disconnected from Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

As expected, the first connect string that uses the hostname is not working, but the other two that use IP addresses still work. Therefore, the java program has to work as well, right?

oracle@test:~$ java OracleJdbcExample
Exception in thread "main" java.sql.SQLRecoverableException: IO Error: Connection reset
 at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:421)
 at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:531)
 at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:221)
 at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
 at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:503)
 at java.sql.DriverManager.getConnection(DriverManager.java:582)
 at java.sql.DriverManager.getConnection(DriverManager.java:154)
 at OracleJdbcExample.main(OracleJdbcExample.java:25)
Caused by: java.net.SocketException: Connection reset
 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
 at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 at oracle.net.ns.DataPacket.send(DataPacket.java:199)
 at oracle.net.ns.NetOutputStream.flush(NetOutputStream.java:211)
 at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:227)
 at oracle.net.ns.NetInputStream.read(NetInputStream.java:175)
 at oracle.net.ns.NetInputStream.read(NetInputStream.java:100)
 at oracle.net.ns.NetInputStream.read(NetInputStream.java:85)
 at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:122)
 at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:78)
 at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1179)
 at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1155)
 at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:279)
 at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)
 at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:366)
 at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:752)
 at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:359)
 ... 7 more

That's not the case. According to Oracle's Metalink note 139775.1, when you're trying to connect to an Oracle service via a dedicated connection using java, the connect string has IP addresses instead of full-qualified domain names, and those IP addresses are not resolvable in the client machine, you will get a Network Adapter Could not Establish Connection error message, even if from that client you can reach the database using the SQL/Plus client. In this example the error message is different, I guess because I'm using Oracle 10g XE.

More information:

Io exception: The Network Adapter could not establish the connection
NL Exception trying to connect to 10g RAC w/JDBC

Friday, April 5, 2013

Resuming a failed file copy

Let's say that you want to copy a very big file, in slow or faulty media like a cheap USB stick or disk, and the copy fails leaving you with an incomplete file:

olimpo:/Linux_minimal# ls -la
total 18739092
drwx------ 2 root root        4096 Apr  3 16:22 .
drwxr-xr-x 8 root root        4096 Apr  3 15:11 ..
-rw-r--r-- 1 root root 19170066432 Apr  3 16:46 Linux.dsk
-rw-r--r-- 1 root root         211 May  2  2010 create.txt
-rwxr-xr-x 1 root root         154 Apr  2 16:02 launch

olimpo:/Linux_minimal# ls -la ../Linux_test2
total 19417644
drwxr-xr-x 2 root root        4096 Mar 28 16:55 .
drwxr-xr-x 8 root root        4096 Apr  3 15:11 ..
-rw-r--r-- 1 root root 19864223744 Apr  2 15:47 Linux.dsk
-rw-r--r-- 1 root root         211 May  2  2010 create.txt
-rwxr-xr-x 1 root root         154 Apr  2 16:02 launch

What a frustration! No matter how many times you try to copy the file, the copy operation aborts before doing the task.

Fortunately you can resume the copy with rsync:

olimpo:/Linux_minimal# rsync --progress --partial --append ../Linux_test2/Linux.dsk Linux.dsk
Linux.dsk
 19864223744 100%   10.87MB/s    0:01:00 (xfer#1, to-check=0/1)

sent 694242119 bytes  received 31 bytes  11107874.40 bytes/sec
total size is 19864223744  speedup is 28.61

And just to be sure, you can check the files with diff (the slow way) or md5sum (the quick way):

olimpo:/Linux_minimal# md5sum Linux.dsk
eccc1436fe7a10acb44973449fd430bc  Linux.dsk

olimpo:/Linux_minimal# md5sum ../Linux_test2/Linux.dsk
eccc1436fe7a10acb44973449fd430bc  ../Linux_test2/Linux.dsk

More information:

How To Resume Failed copy ( cp command ) where it left off?

Wednesday, March 13, 2013

Managing maintenance tasks in Oracle 11g

As you might know, if you want to change execution parameters of automated database maintenance tasks in Oracle 10g you have to modify the jobs and windows associated to those tasks; therefore, you might be tempted to change 11g maintenance tasks the same way but you'll find that there are no jobs that look like maintenance tasks, and the windows still exist but are disabled:

SQL> set linesize 120
SQL> column WINDOW_NAME format a20
SQL> column ENABLED format a7
SQL> column REPEAT_INTERVAL format a60
SQL> column DURATION format a15
SQL> select window_name, enabled, repeat_interval, duration from dba_scheduler_windows
where window_name like 'WEEK%';

WINDOW_NAME          ENABLED REPEAT_INTERVAL                                              DURATION
-------------------- ------- ------------------------------------------------------------ ---------------
WEEKNIGHT_WINDOW     FALSE   freq=daily;byday=MON,TUE,WED,THU,FRI;byhour=22;byminute=0; b +000 08:00:00
                             ysecond=0

WEEKEND_WINDOW      FALSE    freq=daily;byday=SAT;byhour=0;byminute=0;bysecond=0          +002 00:00:00

Did nobody tell you? In Oracle 11g you have to use the DBMS_AUTO_TASK_ADMIN procedures to control the three maintenance tasks: Automatic Optimizer Statistics Collection, Automatic Segment Advisor and Automatic SQL Tuning Advisor.

To check tasks information you can do this:

SQL> column CLIENT_NAME format a32
SQL> column STATUS format a10
SQL> column WINDOW_GROUP format a20
SQL> column ATTRIBUTES format a40
SQL> select client_name, status, window_group, attributes from dba_autotask_client;

CLIENT_NAME                      STATUS     WINDOW_GROUP         ATTRIBUTES
-------------------------------- ---------- -------------------- ----------------------------------------
auto optimizer stats collection  ENABLED    ORA$AT_WGRP_OS       ON BY DEFAULT, VOLATILE, SAFE TO KILL
auto space advisor               ENABLED    ORA$AT_WGRP_SA       ON BY DEFAULT, VOLATILE, SAFE TO KILL
sql tuning advisor               ENABLED    ORA$AT_WGRP_SQ       ONCE PER WINDOW, ON BY DEFAULT, VOLATILE
                                                                 , SAFE TO KILL

As you can see, all the three tasks are enabled by default and have its own window group, but all the window groups have the same window members, one per day of the week:

SQL> select window_name, enabled, repeat_interval, duration from dba_scheduler_windows where window_name in
(select WINDOW_NAME from dba_scheduler_wingroup_members where WINDOW_GROUP_NAME='ORA$AT_WGRP_SQ');

WINDOW_NAME          ENABLED REPEAT_INTERVAL                                              DURATION
-------------------- ------- ------------------------------------------------------------ ---------------
SATURDAY_WINDOW      TRUE    freq=daily;byday=SAT;byhour=6;byminute=0; bysecond=0         +000 20:00:00
WEDNESDAY_WINDOW     TRUE    freq=daily;byday=WED;byhour=22;byminute=0; bysecond=0        +000 04:00:00
THURSDAY_WINDOW      TRUE    freq=daily;byday=THU;byhour=22;byminute=0; bysecond=0        +000 04:00:00
SUNDAY_WINDOW        TRUE    freq=daily;byday=SUN;byhour=6;byminute=0; bysecond=0         +000 20:00:00
FRIDAY_WINDOW        TRUE    freq=daily;byday=FRI;byhour=22;byminute=0; bysecond=0        +000 04:00:00
TUESDAY_WINDOW       TRUE    freq=daily;byday=TUE;byhour=22;byminute=0; bysecond=0        +000 04:00:00
MONDAY_WINDOW        TRUE    freq=daily;byday=MON;byhour=22;byminute=0; bysecond=0        +000 04:00:00

7 rows selected.

Then, if you want to disable a task you can do it this way:

SQL> begin
dbms_auto_task_admin.disable
(client_name => 'sql tuning advisor',
operation     => NULL,
window_name   => NULL);
end;
/

PL/SQL procedure successfully completed.

SQL> select client_name, status, window_group, attributes from dba_autotask_client;

CLIENT_NAME                      STATUS     WINDOW_GROUP         ATTRIBUTES
-------------------------------- ---------- -------------------- ----------------------------------------
auto optimizer stats collection  ENABLED    ORA$AT_WGRP_OS       ON BY DEFAULT, VOLATILE, SAFE TO KILL
auto space advisor               ENABLED    ORA$AT_WGRP_SA       ON BY DEFAULT, VOLATILE, SAFE TO KILL
sql tuning advisor               DISABLED   ORA$AT_WGRP_SQ       ONCE PER WINDOW, ON BY DEFAULT, VOLATILE
                                                                 , SAFE TO KILL

And for enabling it again:

SQL> begin
dbms_auto_task_admin.enable
(client_name => 'sql tuning advisor',
operation     => NULL,
window_name   => NULL);
end;
/

PL/SQL procedure successfully completed.

SQL> select client_name, status, window_group, attributes from dba_autotask_client;

CLIENT_NAME                      STATUS     WINDOW_GROUP         ATTRIBUTES
-------------------------------- ---------- -------------------- ----------------------------------------
auto optimizer stats collection  ENABLED    ORA$AT_WGRP_OS       ON BY DEFAULT, VOLATILE, SAFE TO KILL
auto space advisor               ENABLED    ORA$AT_WGRP_SA       ON BY DEFAULT, VOLATILE, SAFE TO KILL
sql tuning advisor               ENABLED    ORA$AT_WGRP_SQ       ONCE PER WINDOW, ON BY DEFAULT, VOLATILE
                                                                 , SAFE TO KILL

Now, if you want to disable a task for a certain day (window), you can do this:

SQL> begin
dbms_auto_task_admin.disable
(client_name => 'sql tuning advisor',
operation     => NULL,
window_name   => 'MONDAY_WINDOW');
end;
/

PL/SQL procedure successfully completed.

SQL> select WINDOW_NAME, OPTIMIZER_STATS, SEGMENT_ADVISOR, SQL_TUNE_ADVISOR from dba_autotask_window_clients;

WINDOW_NAME          OPTIMIZER_STATS          SEGMENT_ADVISOR          SQL_TUNE_ADVISOR
-------------------- ------------------------ ------------------------ ------------------------
WEDNESDAY_WINDOW     ENABLED                  ENABLED                  ENABLED
FRIDAY_WINDOW        ENABLED                  ENABLED                  ENABLED
SATURDAY_WINDOW      ENABLED                  ENABLED                  ENABLED
THURSDAY_WINDOW      ENABLED                  ENABLED                  ENABLED
TUESDAY_WINDOW       ENABLED                  ENABLED                  ENABLED
SUNDAY_WINDOW        ENABLED                  ENABLED                  ENABLED
MONDAY_WINDOW        ENABLED                  ENABLED                  DISABLED

7 rows selected.

And for enabling a task in a window:

SQL> begin
dbms_auto_task_admin.enable
(client_name => 'sql tuning advisor',
operation     => NULL,
window_name   => 'MONDAY_WINDOW');
end;
/

PL/SQL procedure successfully completed.

SQL> select WINDOW_NAME, OPTIMIZER_STATS, SEGMENT_ADVISOR, SQL_TUNE_ADVISOR from dba_autotask_window_clients;

WINDOW_NAME          OPTIMIZER_STATS          SEGMENT_ADVISOR          SQL_TUNE_ADVISOR
-------------------- ------------------------ ------------------------ ------------------------
WEDNESDAY_WINDOW     ENABLED                  ENABLED                  ENABLED
FRIDAY_WINDOW        ENABLED                  ENABLED                  ENABLED
SATURDAY_WINDOW      ENABLED                  ENABLED                  ENABLED
THURSDAY_WINDOW      ENABLED                  ENABLED                  ENABLED
TUESDAY_WINDOW       ENABLED                  ENABLED                  ENABLED
SUNDAY_WINDOW        ENABLED                  ENABLED                  ENABLED
MONDAY_WINDOW        ENABLED                  ENABLED                  ENABLED

7 rows selected.

Finally, if you want to change window attributes like the repeat interval, you can do it as usual:

SQL> exec dbms_scheduler.set_attribute('MONDAY_WINDOW','repeat_interval','freq=daily;byday=MON;byhour=21;byminute=0; bysecond=0');

PL/SQL procedure successfully completed.

SQL> select window_name, enabled, repeat_interval, duration from dba_scheduler_windows where window_name in
(select WINDOW_NAME from dba_scheduler_wingroup_members where WINDOW_GROUP_NAME='ORA$AT_WGRP_SQ');

WINDOW_NAME          ENABLED REPEAT_INTERVAL                                              DURATION
-------------------- ------- ------------------------------------------------------------ ---------------
SATURDAY_WINDOW      TRUE    freq=daily;byday=SAT;byhour=6;byminute=0; bysecond=0         +000 20:00:00
MONDAY_WINDOW        TRUE    freq=daily;byday=MON;byhour=21;byminute=0; bysecond=0        +000 04:00:00
WEDNESDAY_WINDOW     TRUE    freq=daily;byday=WED;byhour=22;byminute=0; bysecond=0        +000 04:00:00
THURSDAY_WINDOW      TRUE    freq=daily;byday=THU;byhour=22;byminute=0; bysecond=0        +000 04:00:00
SUNDAY_WINDOW        TRUE    freq=daily;byday=SUN;byhour=6;byminute=0; bysecond=0         +000 20:00:00
FRIDAY_WINDOW        TRUE    freq=daily;byday=FRI;byhour=22;byminute=0; bysecond=0        +000 04:00:00
TUESDAY_WINDOW       TRUE    freq=daily;byday=TUE;byhour=22;byminute=0; bysecond=0        +000 04:00:00

7 rows selected.

And just in case you want to know what operation is associated to what client:

SQL> column OPERATION_NAME format a40
SQL> select client_name, operation_name from dba_autotask_operation;

CLIENT_NAME                      OPERATION_NAME
-------------------------------- ----------------------------------------
auto optimizer stats collection  auto optimizer stats job
auto space advisor               auto space advisor job
sql tuning advisor               automatic sql tuning task

More information:

Managing Automated Database Maintenance Tasks
DBMS_AUTO_TASK_ADMIN

Thursday, March 7, 2013

Fixing failures in Oracle 11g with DRA and RMAN

The Data Recovery Advisor was introduced in Oracle 11g, a feature that gives advice regarding database failures and also repairs them if possible. The DRA relies in the Health Monitor for diagnostics and can be used by means of RMAN or Enterprise Manager. It looks fine but since I don't have a lot of faulty databases I don't have experience with it; however, I have to study it in order to get the OCP 11g certification upgrade. At first look it seems like a good feature for administrators in a hurry or without so much experience, but personally I prefer to do things by hand and by myself.

Anyway, if you want to check the DRA capabilities and you have a database for practicing it might be worth the time trying it. The following is an exercise for doing that; first of all, back up your database because backups are an important part of recovering using the DRA. Next, shut down your database and rename (or delete if you're faithful enough) a datafile, and try to start up your database:

oracle@olimpo:~$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.2.0.1.0 Production on Wed Mar 6 13:35:56 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area  263639040 bytes
Fixed Size      1335892 bytes
Variable Size    213913004 bytes
Database Buffers    41943040 bytes
Redo Buffers      6447104 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/opt/oracle/oradata/orcl11g/example01.dbf'

The database didn't open as expected so the next step is running RMAN to use the DRA. As you might know, it's necessary to have the database at least mounted in order to be able to use RMAN:

oracle@olimpo:~$ rman target /

Recovery Manager: Release 11.2.0.1.0 - Production on Wed Mar 6 13:36:56 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL11G (DBID=915822427, not open)

The next step is listing the current known failures:

RMAN> list failure;

using target database control file instead of recovery catalog
List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
42         HIGH     OPEN      06-MAR-13     One or more non-system datafiles are missing

If you want to know more details about certain failure you can do it as well:

RMAN> list failure 42 detail;

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
42         HIGH     OPEN      06-MAR-13     One or more non-system datafiles are missing
  Impact: See impact for individual child failures
  List of child failures for parent failure ID 42
  Failure ID Priority Status    Time Detected Summary
  ---------- -------- --------- ------------- -------
  145        HIGH     OPEN      06-MAR-13     Datafile 5: '/opt/oracle/oradata/orcl11g/example01.dbf' is missing
    Impact: Some objects in tablespace EXAMPLE might be unavailable

The Health Monitor runs diagnostics automatically when some high or critical failure happens, but if for some reason there is not a diagnostic performed first then you cannot get advice regarding a failure. The next step involves getting advice regarding the open failures:

RMAN> advise failure;

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
42         HIGH     OPEN      06-MAR-13     One or more non-system datafiles are missing
  Impact: See impact for individual child failures
  List of child failures for parent failure ID 42
  Failure ID Priority Status    Time Detected Summary
  ---------- -------- --------- ------------- -------
  145        HIGH     OPEN      06-MAR-13     Datafile 5: '/opt/oracle/oradata/orcl11g/example01.dbf' is missing
    Impact: Some objects in tablespace EXAMPLE might be unavailable

analyzing automatic repair options; this may take some time
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=133 device type=DISK
analyzing automatic repair options complete

Mandatory Manual Actions
========================
no manual actions available

Optional Manual Actions
=======================
1. If file /opt/oracle/oradata/orcl11g/example01.dbf was unintentionally renamed or moved, restore it

Automated Repair Options
========================
Option Repair Description
------ ------------------
1      NOARCHIVELOG mode restore datafile 5  
  Strategy: The repair includes complete media recovery with no data loss
  Repair script: /opt/oracle/diag/rdbms/orcl11g/orcl11g/hm/reco_1400180561.hm

As you might guess, the DRA first checks relevant and mandatory prerequisites before giving any automated repair option, like database backups. The optional manual actions are recommendations about solving the failure by hand, instead of letting the ADR fix the failures automatically. Sometimes you have to carry on mandatory manual actions before being able to do any manual or automated repair, and in this case there is no mandatory manual actions to do.

Moreover, there are two kinds of repair options: with no data loss and with data loss, be sure to check the strategy to know if the selected repair option involves no data loss, or at least to understand what you're going to lose if you select that option in case there is no better option.

Also, if you want you can check and modify the repair script listed before doing any repair, or you can preview the repair procedure without doing any repair at all this way:

RMAN> repair failure preview;

Strategy: The repair includes complete media recovery with no data loss
Repair script: /opt/oracle/diag/rdbms/orcl11g/orcl11g/hm/reco_1400180561.hm

contents of repair script:
   # NOARCHIVELOG mode restore datafile
   restore datafile 5;
   recover datafile 5;

If you're happy with the proposed automatic repair options then you can repair the open database failures this way:

RMAN> repair failure;

Strategy: The repair includes complete media recovery with no data loss
Repair script: /opt/oracle/diag/rdbms/orcl11g/orcl11g/hm/reco_1400180561.hm

contents of repair script:
   # NOARCHIVELOG mode restore datafile
   restore datafile 5;
   recover datafile 5;

Do you really want to execute the above repair (enter YES or NO)? YES
executing repair script

Starting restore at 06-MAR-13
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00005 to /opt/oracle/oradata/orcl11g/example01.dbf
channel ORA_DISK_1: reading from backup piece /opt/oracle/flash_recovery_area/ORCL11G/backupset/2013_03_05/o1_mf_nnndf_TAG20130305T165413_8mdxwq5t_.bkp
channel ORA_DISK_1: piece handle=/opt/oracle/flash_recovery_area/ORCL11G/backupset/2013_03_05/o1_mf_nnndf_TAG20130305T165413_8mdxwq5t_.bkp tag=TAG20130305T165413
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
Finished restore at 06-MAR-13

Starting recover at 06-MAR-13
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 06-MAR-13
repair failure complete

Do you want to open the database (enter YES or NO)? YES
database opened

RMAN> exit


Recovery Manager complete.

You have to confirm that you really want to execute repairs, and as a final step you might want to open the database before leaving RMAN. And that's all! In this case everything went fine and the database was repaired with no data loss:

oracle@olimpo:~$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.2.0.1.0 Production on Wed Mar 6 13:43:10 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> column INSTANCE_NAME format a20
SQL> column STATUS format a20
SQL> column DATABASE_STATUS format a20
SQL> select INSTANCE_NAME, STATUS, DATABASE_STATUS from v$instance;

INSTANCE_NAME      STATUS    DATABASE_STATUS
-------------------- -------------------- --------------------
orcl11g       OPEN    ACTIVE

More information:

Diagnosing and Repairing Failures with Data Recovery Advisor

Wednesday, March 6, 2013

RMAN crash course

It has been a long time since I posted here! In my previous job (this is my second week in my new one) I didn't have so much chance to keep posting because my position didn't involve neither UNIX server administration nor database administration, but I hope this new job will give me plenty of material to write interesting posts.

So let's begin with something simple, backing up an Oracle database with RMAN. Let's say that you're learning database management or you have a small and hopefully non-critical database that you want to backup, and you don't mind using the Flash Recovery Area; the directory used as FRA is set by the db_recovery_file_dest parameter in the database. First of all, you have to set the Oracle environment variables needed to connect to your selected database:

oracle@olimpo:~/rman$ set|grep ORACLE
ORACLE_BASE=/opt/oracle
ORACLE_HOME=/opt/oracle/product/10.2.0/server
ORACLE_SID=orcl11g

Then you can run RMAN to connect to the current target database (orcl11g):

oracle@olimpo:~/rman$ rman target /

Recovery Manager: Release 11.2.0.1.0 - Production on Tue Mar 5 16:53:00 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL11G (DBID=915822427, not open)

You might have noticed that my database is not opened, that's because it has no archive logs so in order to make a consistent backup I put it in mount state. If this is the first time you launch RMAN then it would be better to check its parameters:

RMAN> show all;
using target database control file instead of recovery catalog
RMAN configuration parameters for database with db_unique_name ORCL11G are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP OFF; # default
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/opt/oracle/product/10.2.0/server/dbs/snapcf_orcl11g.f'; # default

For example, you might want to check the compression algorithm and retention policy (how many backups you want to keep at the same time); if you want to change something you just have to issue the command the same way it's shown:

RMAN> CONFIGURE RETENTION POLICY TO REDUNDANCY 2;
new RMAN configuration parameters:
CONFIGURE RETENTION POLICY TO REDUNDANCY 2;
new RMAN configuration parameters are successfully stored

RMAN> show all;
RMAN configuration parameters for database with db_unique_name ORCL11G are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 2;
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP OFF; # default
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/opt/oracle/product/10.2.0/server/dbs/snapcf_orcl11g.f'; # default

Then if you're ready to backup your database issue this simple backup command:

RMAN> backup check logical database;

Starting backup at 05-MAR-13
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/opt/oracle/oradata/orcl11g/system01.dbf
input datafile file number=00002 name=/opt/oracle/oradata/orcl11g/sysaux01.dbf
input datafile file number=00005 name=/opt/oracle/oradata/orcl11g/example01.dbf
input datafile file number=00003 name=/opt/oracle/oradata/orcl11g/undotbs01.dbf
input datafile file number=00004 name=/opt/oracle/oradata/orcl11g/users01.dbf
channel ORA_DISK_1: starting piece 1 at 05-MAR-13
channel ORA_DISK_1: finished piece 1 at 05-MAR-13
piece handle=/opt/oracle/flash_recovery_area/ORCL11G/backupset/2013_03_05/o1_mf_nnndf_TAG20130305T165413_8mdxwq5t_.bkp tag=TAG20130305T165413 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:01:16
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 05-MAR-13
channel ORA_DISK_1: finished piece 1 at 05-MAR-13
piece handle=/opt/oracle/flash_recovery_area/ORCL11G/backupset/2013_03_05/o1_mf_ncsnf_TAG20130305T165413_8mdxz43d_.bkp tag=TAG20130305T165413 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 05-MAR-13

This way you can backup a database with no archive logs enabled; if you have a database with archive logs enabled then append the plus archivelog subclause at the end of the backup command. Also, the check logical subclause is for checking logical errors in the database while performing the backup, the command for backing up a database can be as simple as backup database.

Now, if you want to check that your backup is really there you can do it this way:

RMAN> list backup summary;

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1       B  F  A DISK        05-MAR-13       1       1       NO         TAG20130305T165413
2       B  F  A DISK        05-MAR-13       1       1       NO         TAG20130305T165413

But wait, why there are two backups instead of one? Maybe a detailed list would show us why:

RMAN> list backup;

List of Backup Sets
===================


BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
1       Full    1.00G      DISK        00:01:15     05-MAR-13      
        BP Key: 1   Status: AVAILABLE  Compressed: NO  Tag: TAG20130305T165413
        Piece Name: /opt/oracle/flash_recovery_area/ORCL11G/backupset/2013_03_05/o1_mf_nnndf_TAG20130305T165413_8mdxwq5t_.bkp
  List of Datafiles in backup set 1
  File LV Type Ckp SCN    Ckp Time  Name
  ---- -- ---- ---------- --------- ----
  1       Full 854520     05-MAR-13 /opt/oracle/oradata/orcl11g/system01.dbf
  2       Full 854520     05-MAR-13 /opt/oracle/oradata/orcl11g/sysaux01.dbf
  3       Full 854520     05-MAR-13 /opt/oracle/oradata/orcl11g/undotbs01.dbf
  4       Full 854520     05-MAR-13 /opt/oracle/oradata/orcl11g/users01.dbf
  5       Full 854520     05-MAR-13 /opt/oracle/oradata/orcl11g/example01.dbf

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
2       Full    9.36M      DISK        00:00:02     05-MAR-13      
        BP Key: 2   Status: AVAILABLE  Compressed: NO  Tag: TAG20130305T165413
        Piece Name: /opt/oracle/flash_recovery_area/ORCL11G/backupset/2013_03_05/o1_mf_ncsnf_TAG20130305T165413_8mdxz43d_.bkp
  SPFILE Included: Modification time: 05-MAR-13
  SPFILE db_unique_name: ORCL11G
  Control File Included: Ckp SCN: 854520       Ckp time: 05-MAR-13

The first backup set is the full backup of the database, and the second one is the backup of the database's SPFILE, made by default. And before exiting from RMAN and in order to not fill up the FRA (its size set in the db_recovery_file_dest_size parameter), let's delete the obsolete backups, that is, backups no longer needed according to the retention policy:

RMAN> delete noprompt obsolete;
RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 2
using channel ORA_DISK_1
no obsolete backups found

RMAN> quit

Recovery Manager complete.

More information:

Getting Started with RMAN
Oracle Database Backup and Recovery Reference (Backup)

Saturday, May 12, 2012

Configuring a Tata Photon+ USB device in Linux

It has been a long time since my last post because I changed jobs three months ago, and in fact this post might give a hint about my new situation; anyway I hope to keep posting Linux and UNIX-related messages and this one is about dealing with a mobile broadband USB device. It is not so difficult to work with this kind of devices in Linux, and is very simple to configure it with NetworkManager, but first you need to have installed the usb-modeswitch software that changes the device's state from USB storage to USB modem in order to use the USB dongle as a modem. In Debian Squeeze you can do it installing the usb-modeswitch package. If you don't have the usb-modeswitch software you will see just an USB storage device, something like this:
May 12 15:23:38 olimpo kernel: [   64.172559] usb 2-1: new full speed USB device using uhci_hcd and address 2
May 12 15:23:38 olimpo kernel: [   64.339573] usb 2-1: New USB device found, idVendor=12d1, idProduct=1446
May 12 15:23:38 olimpo kernel: [   64.339584] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=4
May 12 15:23:38 olimpo kernel: [   64.339593] usb 2-1: Product: HUAWEI Mobile
May 12 15:23:38 olimpo kernel: [   64.339600] usb 2-1: Manufacturer: HUAWEI TECHNOLOGIES
May 12 15:23:38 olimpo kernel: [   64.339607] usb 2-1: SerialNumber: XXXXXXXXXXXXXXXXXX
May 12 15:23:38 olimpo kernel: [   64.339850] usb 2-1: configuration #1 chosen from 1 choice
May 12 15:23:38 olimpo kernel: [   64.350119] scsi4 : SCSI emulation for USB Mass Storage devices
May 12 15:23:38 olimpo kernel: [   64.360054] usb-storage: device found at 2
May 12 15:23:38 olimpo kernel: [   64.360061] usb-storage: waiting for device to settle before scanning
After installing the usb-modeswitch software you can plug your USB modem, and you will see at the end something like this:
May 12 15:23:40 olimpo usb_modeswitch: switching 12d1:1446 (HUAWEI TECHNOLOGIES: HUAWEI Mobile)
May 12 15:23:46 olimpo kernel: [   70.408061] usb 2-1: new full speed USB device using uhci_hcd and address 3
May 12 15:23:46 olimpo kernel: [   70.576318] usb 2-1: New USB device found, idVendor=12d1, idProduct=140b
May 12 15:23:46 olimpo kernel: [   70.576330] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=4
May 12 15:23:46 olimpo kernel: [   70.576339] usb 2-1: Product: HUAWEI Mobile
May 12 15:23:46 olimpo kernel: [   70.576345] usb 2-1: Manufacturer: HUAWEI TECHNOLOGIES
May 12 15:23:46 olimpo kernel: [   70.576352] usb 2-1: SerialNumber: XXXXXXXXXXXXXXXXXX
May 12 15:23:46 olimpo kernel: [   70.576600] usb 2-1: configuration #1 chosen from 1 choice
May 12 15:23:46 olimpo kernel: [   70.587360] scsi8 : SCSI emulation for USB Mass Storage devices
May 12 15:23:46 olimpo kernel: [   70.588245] usb-storage: device found at 3
May 12 15:23:46 olimpo kernel: [   70.588252] usb-storage: waiting for device to settle before scanning
May 12 15:23:46 olimpo kernel: [   70.785972] usbcore: registered new interface driver usbserial
May 12 15:23:46 olimpo usb_modeswitch: switched to 12d1:140b (HUAWEI TECHNOLOGIES: HUAWEI Mobile)
May 12 15:23:46 olimpo kernel: [   70.787067] USB Serial support registered for generic
May 12 15:23:46 olimpo kernel: [   70.788192] usbcore: registered new interface driver usbserial_generic
May 12 15:23:46 olimpo kernel: [   70.788201] usbserial: USB Serial Driver core
May 12 15:23:46 olimpo kernel: [   71.042606] USB Serial support registered for GSM modem (1-port)
May 12 15:23:46 olimpo kernel: [   71.048829] option 2-1:1.0: GSM modem (1-port) converter detected
May 12 15:23:46 olimpo kernel: [   71.050191] usb 2-1: GSM modem (1-port) converter now attached to ttyUSB0
May 12 15:23:46 olimpo kernel: [   71.050242] option 2-1:1.1: GSM modem (1-port) converter detected
May 12 15:23:46 olimpo kernel: [   71.050660] usb 2-1: GSM modem (1-port) converter now attached to ttyUSB1
May 12 15:23:46 olimpo kernel: [   71.050702] option 2-1:1.2: GSM modem (1-port) converter detected
May 12 15:23:46 olimpo kernel: [   71.056719] usb 2-1: GSM modem (1-port) converter now attached to ttyUSB2
May 12 15:23:46 olimpo kernel: [   71.056795] usbcore: registered new interface driver option
May 12 15:23:46 olimpo kernel: [   71.056803] option: v0.7.2:USB Driver for GSM modems
This is the most important part, to have detected your USB dongle as a GSM modem. If you like you can configure it easily with NetworkManager, but if you prefer something that you can do in a terminal then install the wvdial package and fill the /etc/wvdial.conf and /etc/ppp/chap-secrets files with this:
olimpo:/var/log# cat /etc/wvdial.conf
[Dialer Defaults]
Init1 = ATZ
Init2 = ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0
Modem Type = USB Modem
ISDN = 0
Phone = #777
New PPPD = yes
Modem = /dev/ttyUSB0
Username = internet
Password = internet
Baud = 230400

olimpo:/var/log# cat /etc/ppp/chap-secrets
# Secrets for authentication using CHAP
# client server secret   IP addresses
internet * internet
And if you're using Debian you can also issue the dpkg-reconfigure wvdial command and give all the information required to the configuration program. Then you can launch wvdial as root from a terminal prompt and connect to the Internet, and you will see something like this in the syslog:
May 12 15:47:06 olimpo pppd[11490]: pppd 2.4.5 started by root, uid 0
May 12 15:47:06 olimpo kernel: [ 1770.688570] PPP generic driver version 2.4.2
May 12 15:47:06 olimpo pppd[11490]: Using interface ppp0
May 12 15:47:06 olimpo pppd[11490]: Connect: ppp0 <--> /dev/ttyUSB0
May 12 15:47:09 olimpo pppd[11490]: CHAP authentication succeeded
May 12 15:47:09 olimpo pppd[11490]: CHAP authentication succeeded
May 12 15:47:09 olimpo kernel: [ 1773.210976] PPP BSD Compression module registered
May 12 15:47:09 olimpo kernel: [ 1773.225216] PPP Deflate Compression module registered
May 12 15:47:09 olimpo pppd[11490]: local  IP address 59.161.17.54
May 12 15:47:09 olimpo pppd[11490]: remote IP address 172.29.145.65
May 12 15:47:09 olimpo pppd[11490]: primary   DNS address 4.2.2.3
May 12 15:47:09 olimpo pppd[11490]: secondary DNS address 121.242.190.181

...

May 12 15:54:11 olimpo pppd[11490]: Terminating on signal 15
May 12 15:54:11 olimpo pppd[11490]: Connect time 7.1 minutes.
May 12 15:54:11 olimpo pppd[11490]: Sent 460447 bytes, received 1883330 bytes.
Of course, this procedure might be helpful for configuring other USB modems as well.

Monday, February 13, 2012

ORA-09817 error and full file systems

Let's say that you're trying to log into your database and suddenly got this error message:

oracle@mydb$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.1.0.7.0 - Production on Mon Feb 13 15:47:27 2012

Copyright (c) 1982, 2008, Oracle. All rights reserved.

ERROR:
ORA-09817: Write to audit file failed.
SVR4 Error: 28: No space left on device
ORA-01075: you are currently logged on


Enter user-name:

oracle@mydb$

This is an easy problem to guess and the key is in the message No space left on device. The instance cannot write audit files because the file system is full so you have to figure out why is that file system full:

oracle@mydb$ set | grep ORA
ORACLE_BASE=/oracle
ORACLE_HOME=/oracle/orahome
ORACLE_SID=mydb

oracle@mydb$ strings $ORACLE_HOME/dbs/*mydb.ora | grep audit_file_dest
*.audit_file_dest='/oracle/admin/mydb/adump'

oracle@mydb$ df -k | egrep '(Filesystem)|(/oracle)'
Filesystem kbytes used avail capacity Mounted on
/dev/vx/dsk/orafs/oracle 16327680 16327680 0 100% /oracle

In this case the audit file destination is in the Oracle base file system, therefore the problem might be in other directory:

oracle@mydb$ du -k /oracle | sort -nr | head
16035094 /oracle
8412062 /oracle/diag
8297523 /oracle/diag/rdbms/mydb/mydb
8297523 /oracle/diag/rdbms/mydb
8297523 /oracle/diag/rdbms
8250027 /oracle/diag/rdbms/mydb/mydb/trace
6926099 /oracle/orahome
1631834 /oracle/orahome/mydirectory
938085 /oracle/orahome/mydirectory/somefiles
727017 /oracle/orahome/bin

oracle@mydb$ du -ka /oracle/diag/rdbms/mydb/mydb/trace | sort -nr | head
8250027 /oracle/diag/rdbms/mydb/mydb/trace
5737533 /oracle/diag/rdbms/mydb/mydb/trace/mydb_ora_5198.trc
2506955 /oracle/diag/rdbms/mydb/mydb/trace/mydb_ora_18305.trc
1946 /oracle/diag/rdbms/mydb/mydb/trace/mydb_ora_5198.trm
969 /oracle/diag/rdbms/mydb/mydb/trace/mydb_ora_18305.trm
457 /oracle/diag/rdbms/mydb/mydb/trace/mydb_ora_3695.trc
380 /oracle/diag/rdbms/mydb/mydb/trace/alert_mydb.log
314 /oracle/diag/rdbms/mydb/mydb/trace/mydb_j002_8800.trc
314 /oracle/diag/rdbms/mydb/mydb/trace/mydb_j002_13295.trc
314 /oracle/diag/rdbms/mydb/mydb/trace/mydb_j000_23311.trc

Here is the problem: there are two very big trace files that occupy 8 gigabytes, half the file system size. When you know why your file system is full you have to decide what to do, like erasing, moving or compressing files, just be sure not to discard files you might need.