Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

fscsi# Device Is Busy When Hot Swapping an HBA

Status
Not open for further replies.

aixadmn

MIS
Sep 11, 2006
6
US
I have done this many times in the past without issues, but suddenly find myself not able to hot swap a failed HBA. Not sure whether I'm at fault, an AIX upgrade/patch or EMC upgrade/patch.

Background: AIX 5300-03, p570 LPAR with 4 x HBA paths to EMC DMX800 (same SYMdevs down each of the 4 paths), Powerpath 4.5.1.0 and EMC Symmetrix Support 5.2.0.1. The p570 called home with a failed HBA, and taking the server down isn't an option (without a lot of scheduling headaches).

So, here is the approach that I'm taking:

1) Take the path offline from a Powerpath perspective.
powermt remove hba=#

2) Remove protocol device and all children.
rmdev -Rdl fscsi#

NOTE: all child devices remove as expected but the fscsi# device fails to remove with this error:
Method error (/usr/lib/methods/ucfgdevice):
0514-062 Cannot perform the requested function because the
specified device is busy.

3) Replace the device with these commands.
rmdev -l fcs#
drslot -c pci -I -R -l fcs#

Obviously I'm not getting to step 3 but cannot find a resolution to why the protocol device is busy. 'fuser -Vxuc /dev/fscsi#' shows serveral root processes, nothing that I wouldn't expect.

I have done this many times in the past on AIX 5.1 and Powerpath 3.x. I've had various cases opened with IBM, EMC and the 2 jointly, all to no avail. Can someone help point me in the right direction?
 
Some child device for fcsX/fscsiX can't be deleted... Here's one way to find out which one (and hopefully also why you could't delete it):

First find out the location for fcsX

lsdev -C|grep fcsX

then find out which devices are dependent (same location or a sub-address)

lsdev -C|grep XX-YY (location of first lsdev -C command)

you should see a device fcnetX, fscsiX and also the child devices of fscsiX that failed to delete.

Then run fuser /dev/hdiskNN to see what is holding you back.


HTH,

p5wizard
 
In this case, here are the devices...

fcnet1 Defined 06-08-01 Fibre Channel Network Protocol Device
fcs1 Available 06-08 FC Adapter
fscsi1 Available 06-08-02 FC SCSI I/O Controller Protocol Device

I don't see anything hanging off of the fscsi1 device that should keep me from deleting it. EMC support is now wanting me to issue a 'fuser -kucx /dev/fcs1' to free it up. I don't think it will do what they are expecting since whatever has it is a kernel-level process.

BTW: I don't remove the fcs adapter so as to not loose my init_link settings.
 
Obviously I'm not getting to step 3 but cannot find a resolution to why the protocol device is busy. 'fuser -Vxuc /dev/fscsi#' shows serveral root processes, nothing that I wouldn't expect.

please show these processes?


HTH,

p5wizard
 
/dev/fscsi1:
vfs 0c(root)
vfs 1c(root)
vfs 139458c(root)
vfs 188548c(root)
vfs fd=0 372926(root)
vfs 385212c(root)
vfs 389356c(root)
vfs fd=0 409796(root)
vfs fd=0 413762(root)
vfs fd=0 418016(root)
vfs 430098c(root)
vfs 434414c(root)
vfs 438314c(daemon)
vfs 442602c(root)
vfs 450786c(root)
vfs 454882c(root)
vfs 458996c(root)
vfs 462854c(root)
vfs 483342c(root)
vfs 491530c(root)
vfs fd=0 495670(root)
vfs 507920c(root)
vfs fd=0 512004(root)
vfs 520324c(root)
vfs 524294c(root)
vfs 528440c(root)
vfs 540684c(root)
vfs fd=0 553032(root)
vfs fd=0 561172(root)
vfs fd=0 569490(root)
vfs fd=0 577562(root)
vfs fd=0 581662(root)
vfs fd=0 585760(root)
vfs fd=0 589864(root)
vfs fd=0 593956(root)
vfs fd=0 598052(root)
vfs fd=0 630938(root)
vfs 635062c(root)
vfs 647260c(root)
vfs fd=0 667664(root)
vfs fd=0 733394(u799924)
vfs fd=0 786684(root)
vfs fd=0 962632(root)
vfs fd=0 1007626(root)
vfs 1040478c(root)
vfs fd=0 1114326(root)
vfs 1175700c(root)
vfs 1183980c(root)
vfs 1200346c(root)
vfs fd=0 1237102(root)
vfs 1282056c(root)
vfs 1310816c(root)
vfs 1327294c(u799924)
vfs fd=0 1343680(root)
vfs 1417330c(root)
vfs fd=0 1675308(u799924)
vfs fd=0 1691740(sshd)

I'm not sure why some of the processes are here, maybe as a result of another parent process which has locked the fscsi device?
 
I had this same method error when I tried to do an
rmdev -dl fcs1 -R. I was trying to remove the hba so that I could move it to another switch. The rmdev would remove many of the hdisk devices (these are power path paths) but would fail when it tried to remove the first hdiskpower (lun) device. I did a lsdev -p fscs1 and could see that there were some hdisk devices and some hdiskpower devices still attached. I manually removed all the remaing hdisk devices with rmdev. After talking to EMC they said that I did not need to remove the hdiskpower devices just the hdisk paths. The hdiskpower devices are sometimes impossible to remove if the file systems are mounted and the VG is varied on. Anyway the rmdev -dl fcs1 -R never completed without the method error but I was ok as long as I could delete all the hdisk devices manually. You should be able to unzone the bad card - add the new card - zone the new card - run cfgmgr and powermt config. It should create all the hdisk paths and then attach the hdiskpower LUNS.
 
Yea, I could add a new card but I cannot pull this card out. The drslot command, which allows one to replace hot-pluggable cards, won't allow me to remove the card until it is in the Defined state. The only way to get fcs1 in a Defined state is to delete the fscsi1 device...hence my problem.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top