Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Fun with AIX and Fibre Channel 2

Status
Not open for further replies.

Breslau

Technical User
Jul 14, 2003
278
US
All,

I've been seeing some funky results when using a 3rd party SAN with a group of 6H1s and 6M1s. When importing a vg we'll occasionally see import times of 2 to 4 miutes. While this is going on i see port login messages being sent to the console of the brocade switch the host is talking to, i will also see the other LUNs in this configuration being polled sequentially and not in parallel (importvg will check all other known drives for consistency). If the response time is good on the import command the other LUNs are hit in parallel.

the other issue we have is loss of connectivity to imported drives once we try to do some i/o with them. errpt will see some entries about failed drives. while this is going on, we'll see 'port turned off' and 'loop down' msgs logged on the RAID controller.

the fun part is that we have not been able to isolate where the problem lies, we've tried various tests like connecting a host directly to the raid controller, and monkeying around with LUN ids, and settings for the HBA, etc with no luck. the problems don't occur consistenly enough to point to any single area as the culprit.

Our basic config is:

6 AIX hosts
2 brocade 3800's
2 CMD raid controller pairs with two physical fibre ports per pair
1 disk chassis, 3 raid groups per chassis
62 non-masked LUNs, ie all LUNs visible to all hosts

If anyone has seen similar behaviour and/or has suggestions it would be greatly appreciated!
 
If possible, you may want to completely remove the disks, dars, dacs, fscsi and fcs definitions associated with the problem vgs, then run cfgmgr. Ensure vg is varied off prior to deleting the aformentioned items. Have experienced AIX fibre channel anomailies which have been cleared up by running script to remove disk, dars, etc., then running cfgmgr. Script can be provided.
 
An update, in case anyone else has a similar issue.

the loss of connectivity issue was addressed by turning off auto-negotiate on our brocade ports. however, this will in turn cause issues with vsd. primary hosts will not be able to pick up drives once they are booted, they will see their old drives as being 'locked'. this problem is known on disk subsystems not certified by ibm for use with vsd.
 
Thanks Breslau

Just a question... Are you able to increase the size of a filesystem on the SAN?
 
Yeah, that was never a problem for us. are you having an issue with that?
 
We had an issue recently. An filesystem was extended and it crashed. The filesystem has to be removed and then re-created. Data was then restored.
The same hardware also gave us a strange jfs corruption error on the 6M1 and then halted the box.
A week later a 44P connected to the same SAN did the very same thing.
AIX dumps come out clean (except it showed very high i/o)
The SAN logs came out clean as well.
Still a mystery ....
 
Did you try to extend the filesystem across LUNs?

You mention jfs, so you're not using jfs2, or gpfs?

"very high i/o" - were you seeing disk(s) pegged at 100% utilization but not much of anything getting through?

IBM disk subsystem, or 3rd party?
 
One of the other SysAdmins tried to extend it across LUNs

They are all jfs, and the io was 100% but nothing was happening on the box in terms of Oracle and the application.

The subsystem is Hitachi.

After the 2 crashes, all that was done was a microcode update on the disks on the SAN.
We'll have to wait and see if we crash again. Pretty nerve wrecking since this is the core application of the business.
 
I had the same problem, turned out to be the drivers, we use hitachi dlm to manage the connections, it not cheap (approx £2000) but works like a dream. I believe the fibre cards are rebranded ( I'll search for details and repost) and there is a microcode update available. Are you using 1 or 2 Gb fibre cards?

--
| Mike Nixon
| Unix Admin
|
----------------------------
 
Thanks for the info mrn
Very useful...

They are 1GB cards...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top