Monday, May 9, 2011

Oracle instances dying and read-only filesystems

You have a Linux server running one or more Oracle instances, and some day all the instances are down except their listeners. You check the alert log and found absolutely nothing, and if you're lucky enough you try to start up instances and get an error message about a read-only filesystem.

myserver> dbstart
ORACLE_HOME_LISTNER is not SET, unable to auto-start Oracle Net Listener
Usage: /oracle/product/10.2.0/bin/dbstart ORACLE_HOME
touch: cannot touch `/oracle/product/10.2.0/startup.log': Read-only file system
chmod: changing permissions of `/oracle/product/10.2.0/startup.log': Read-only file system
Processing Database instance "mydb01": log file /oracle/product/10.2.0/startup.log
/oracle/product/10.2.0/bin/dbstart: line 361: /oracle/product/10.2.0/startup.log: Read-only file system
touch: cannot touch `/oracle/product/10.2.0/startup.log': Read-only file system
chmod: changing permissions of `/oracle/product/10.2.0/startup.log': Read-only file system
Processing Database instance "mydb02": log file /oracle/product/10.2.0/startup.log
/oracle/product/10.2.0/bin/dbstart: line 361: /oracle/product/10.2.0/startup.log: Read-only file system

myserver> touch /oracle/product/10.2.0/test.txt
touch: cannot touch `/oracle/product/10.2.0/test.txt': Read-only file system

You try to create an archive with touch in the Oracle's filesystem and you can't, so you check server up time, load and mounted filesystems:

myserver> uptime
11:34:30 up 138 days, 18:02, 1 user, load average: 0.00, 0.00, 0.00

myserver> mount
/dev/mapper/VolGroup01-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/cciss/c0d0p1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/mapper/appsvg-lv001 on /oracle type ext3 (rw,_netdev)

Everything seems fine but you notice the option _netdev in the Oracle filesystem. When you're configuring filesystems in /etc/fstab and some of them are network-dependent you put this option so the operating system won't try to mount them before having network connectivity.

Therefore, you have a filesystem that is accessed by network so it might be with iSCSI:

myserver> /sbin/lsmod|grep iscsi
iscsi_tcp 19785 3
libiscsi_tcp 21957 2 iscsi_tcp,cxgb3i
libiscsi2 42181 5 ib_iser,iscsi_tcp,bnx2i,cxgb3i,libiscsi_tcp
scsi_transport_iscsi2 37709 7 ib_iser,iscsi_tcp,bnx2i,cxgb3i,libiscsi2
scsi_transport_iscsi 6085 1 scsi_transport_iscsi2
scsi_mod 141717 23 mptctl,ib_iser,iscsi_tcp,bnx2i,cxgb3i,libiscsi2,scsi_transport_iscsi2,scsi_dh,sg,qla2xxx,scsi_transport_fc,mptspi,scsi_transport_spi,mptsas,mptscsih,scsi_transport_sas,usb_storage,cciss,hpahcisr,sd_mod

myserver> /sbin/iscsiadm -m session -P 2
iscsiadm: Maybe you are not root?
iscsiadm: Could not lock discovery DB: /var/lock/iscsi/lock.write: Permission denied
Target: iqn.2000-03.com.someprovider:mycompany:87:mycmp1
Current Portal: 10.0.57.23:3260,1
Persistent Portal: 10.0.57.34:3260,1
**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1994-05.com.redhat:1234abc56d78
Iface IPaddress: 10.0.57.85
Iface HWaddress:
Iface Netdev:
SID: 1
iSCSI Connection State: Unknown
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: Unknown
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 262144
FirstBurstLength: 262144
MaxBurstLength: 1048576
ImmediateData: Yes
InitialR2T: No
MaxOutstandingR2T: 1

That's the problem! Oracle instances die because cannot write database files, and if the logs are placed in the same filesystems Oracle cannot write error messages either. This connectivity problem would be by heavy load, network glitches, problems with the disk appliance or something else, but if the problem is not so bad your sysadmin might change some iSCSI parameters to help a bit:

node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0
node.session.timeo.replacement_timeout = 86400

More information:

Filesystems becoming read-only on iSCSI
Linux* Open-iSCSI

No comments:

Post a Comment