Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

0506-349 "Cannot unmount"

Status
Not open for further replies.

foobar13

MIS
May 8, 2007
24
CA
I see from thread52-624860 that this has been an ongoing problem. We're just migrating from Tru64 to AIX 5.2. I have a recent fix kit installed, and I've configured HACMP for Oracle 10g. I'm getting the dreaded "cannot unmount" despite the fact that fuser and lsof can't find any open files. svmon -S shows activity on the filesystem that won't unmount, but I don't know what's got its hooks in there. Rebooting will fix it, but that's just silly. I'm shocked if that's what IBM's answer to this is because what's the point of HACMP if it can't handle this? I'm also horrified that the previous thread lasted two years. I guess I'm getting used to open source.

The bugger of it is that there appears to be a race condition because it doesn't happen on every HACMP failover. There are NFS shares in there as well as a namefs remount of the oradata directories to enable cio. Yes, I know it's not simple. You should see the rest of the environment.

If you have any suggestions for working through this I'd really appreciate hearing about them. I've been through the obvious stuff; what I'm looking for is figuring out what svmon -S is telling me and linking that back to something I can kill.

 
Just a hunch:

I'd look for symbolic links from one filesystem to the one you can't unmount. Mostly with oracle it is due to a shared library from one installation which is still in use by some other oracle installation or process (listener?)



HTH,

p5wizard
 
Good morning,

this is a well known problem in our division :)

So we wrote a little script getting rid of anything that prevented our filesystems from unmounting.

I'm not sure whether this will help in your case or not but it sure can't hurt to try:

First Part
----------
ps -u $SIDADM,$DB2ADM -o pid="" | while read PID
do
kill -9 $PID
done

--> $SIDADM and $DB2ADM stands for any applications user beside root, that's running processes on your system (e.g. your oracle user). You have to change these according to your system.

Second Part
------------
lsvgfs "Volume Group Name" | sort -r | while read FS
do
(mount | grep $FS) && fuser -kxuc $FS
umount $FS

--> Volume Group Name specifies the VG the filesystems you want to unmount are residing on.
--> If the filesystems are on rootvg you should'nt run this automatically, because it might umount some filesystems you don't want it to.

Regards
Thomas
 
I did some investigation today using smon and kdb. Using smon -S you can find out what segment id's are "open". To be honest, I don't know what a segment id is. I think it's just a mapped part of memory and because it exists, it's holding open a vnode on the filesystem I want to close.

kdb has a subcommand that will show you what is in the memory locations. Using the subcommand:

vsidd 03ecf:0 10

will show you 10 bytes of the vsidd 03ecf (from smon -S) starting at offset 0. Lo and behold, I found the start of the executable file for the tnslsnr. The remaining vsids from smon -S that show stuff on my unmountable logical volume appear to be .so and .a files or fragments. There are no corresponding pids, of course. If there were, I would kill them.

Now I would like to know how to unhook these vsids and release the filesystem.

I'll try again tomorrow.

Thanks for your support.
 
Like I said before, .so and .a files point to (shared) libraries being in use. And tnslsnr is the oracle listener process.

su -[orauser] -c lsnrctl stop

Would take care of stopping it and closing the library files.

I'd look for symbolic links for libraries inside your oracle client-software filesystem to the oracle-server filesystems you (or the hacmp scripts) cannot unmount. Then take care of that issue so you won't get into the same situation again.

Or make sure the takeover scripts stop the listener before trying to unmount...


HTH,

p5wizard
 
>lsnrctl stop

LSNRCTL for IBM/AIX RISC System/6000: Version 10.2.0.3.0 - Production on 10-MAY-2007 12:51:47

Copyright (c) 1991, 2006, Oracle. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=cook.foo.bar.com)(PORT=1521)))
TNS-12541: TNS:no listener
TNS-12560: TNS:protocol adapter error
TNS-00511: No listener
IBM/AIX RISC System/6000 Error: 79: Connection refused

There is no listener process. Starting it and stopping it doesn't help either. When this problem started, the process died without cleaning up properly.

Further investigation reveals that the file I'm seeing in the debugger is not an executable, but rather the listener.log. One of the dbas is saying that he had the file open, perhaps in vi or an oracle utility when I pushed the failover. There is no evidence of this in the process table.
 
OK, I've got a fix that's specific to AIX 5200-09-03. The problem lies in the fact that the kernel maps memory to manage open files. Sometimes the process will die, but the kernel will hold the frames in memory as in-use. According to APAR IY90815, this can happen when mulitiple libraries are loading and one of the loads fails. What you wind up with is the "Cannot unmount" message and no processes to kill to fix it. You can spot this by using smon -S to see the segments the process was using. With the vsids in hand, you can then see what the open files were, and what frames are being held open.

[tt]
root@cook# svmon -S |awk '/oraload/{print $1 }' |xargs svmon -D |awk '/^ / && $1 !~ /Page/{ print $2 }' | xargs svmon -F |more

Frame Segid Ref Mod Pincount State Swbits ExtSegid LPage

2618349 1a1dda N N 0/0 In-Use 88000004 - N

2618347 1a1dda N N 0/0 In-Use 88000004 - N

2618345 1a1dda N N 0/0 In-Use 88000004 - N
[/tt]

Note the state as "In-Use" despite there are no processes running. You can see the open files by passing the inode numbers to either ncheck, or just use -j with smon -S. The inode numbers come out of smon -S and follow the logical volume after the colon.

[tt]
root@cook# svmon -S |awk '/oraload/{print $1 }' |xargs svmon -j -S



Vsid Esid Type Description LPage Inuse Pin Pgsp Virtual

173a57 - clnt /dev/oraload_d01_lv:10312 - 3988 0 - -

/oraload/d01/app/oracle/product/102/bin/tnslsnr

1a1dda - clnt /dev/oraload_d01_lv:1831 - 3731 0 - -

/oraload/d01/app/oracle/product/102/bin/lsnrctl

c3a0c - clnt /dev/oraload_d01_lv:21551 - 3662 0 - -

/oraload/d01/app/oracle/product/102/network/log/listene

13aa1 - clnt /dev/oraload_d01_lv:4354 - 10 0 - -

/oraload/d01/app/oracle/product/102/network/mesg/tnsus.

43aa4 - clnt /dev/oraload_d01_lv:5202 - 6 0 - -

/oraload/d01/app/oracle/product/102/nls/data/lx1boot.nl

1716f7 - clnt /dev/oraload_d01_lv:10292 - 3 0 - -

/oraload/d01/app/oracle/product/102/network/mesg/nlus.m

339e3 - clnt /dev/oraload_d01_lv:7136 - 2 0 - -

/oraload/d01/app/oracle/product/102/lib/libskgxn2.so
[/tt]

APAR IY90815 doesn't have a PTF yet. It's apparently due in 5200-09-07. An efix is here:

ftp://testcase.software.ibm.com/fromibm/aix/IY90815_98.070328.epkg.Z

It contains a new kernel, so you'll need to reboot. I've done some preliminary testing and I've been unable to get the "Cannot unmount" message again.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top