Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

AIX,hdisk defined! 1

Status
Not open for further replies.

xiayd

IS-IT--Management
Mar 27, 2006
16
CN
we have two ibm r6 machine,os:aix 4.3.3 ml11,hacmp 4.4.1 ,connect one 7133 with 2 ssa card,one sun 3310 with 2 scsi card.
like this:

?lsdev –Cc disk
hdisk0 Available 40-60-00-4,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 40-60-00-12,0 16 Bit LVD SCSI Disk Drive
hdisk2 Available 1A-08-L SSA Logical Disk Drive
hdisk3 Available 1A-08-L SSA Logical Disk Drive
hdisk4 Available 34-08-00-0,0 Other SCSI Disk Drive
hdisk5 Available 34-08-00-0,1 Other SCSI Disk Drive
hdisk6 Available 34-08-00-0,2 Other SCSI Disk Drive
hdisk7 Available 34-08-00-0,3 Other SCSI Disk Drive

#lspv
hdisk0 000c94cfbf73212d rootvg
hdisk1 000c94bfbf4e6182 rootvg
hdisk2 000c94cf7c312ebc IASlog
hdisk3 000c94bfafbd7d90 oralogvg
hdisk4 000c94bf8b6b451c backupvg
hdisk5 000c94bf9547fe1b DominoVG
hdisk6 000c94bfb062a75f IASvg
hdisk7 000c94bf8c76466a ora2datvg

but sometime when the machine reboot,
hdisk7 defined, and a new hdisk8 is available!

anyone knows?

thks!

 
Have you at some point had a disk replaced?

When hdisk becomes defined does hdisk8 have the same PVID?

Mike

"A foolproof method for sculpting an elephant: first, get a huge block of marble, then you chip away everything that doesn't look like an elephant."

 
no disk replaced.
hdisk2,hdisk3 never defined. they belong to 7133.

hdisk4--7 sometimes defined, they belong to 3310.

maybe 3310 or the scsi card has some bug in aix?
 
when aix boots it has a quick chat with all the devices (cfgmgr) if the reply does not match the previous answer then aix thinks it is a new device and gives it a new name.
so the sun disks are not giving the same answer every time aix asks who they are.
as ibm probably don't support the sun disk box, try asking sun if they have their own drivers for aix.
 
or check if there is a firmware/driver update for the scsi card that the sun disks are attached to.
 
It's not that the sun disk box gives a different answer, it's just that if the disk is reserved by the other node AIX cannot read its PVID, and in that case it will allways suppose it's a new disk. These are called ghost disks, and it's important not to remove them, they're necessary for LVM.
 
One way to confirm that DukeSSD is right (which I'm pretty sure he is) is to check "lsdev -Cc" when one of the sun drives becomes "defined". The new hdisk8 should show a different SCSI id (the "number,number" pairs at the end of the connection address).

Other, possible but less likely, causes:

1.) You said the 3310 was connected to two SCSI cards, but your posted lsdev output has all the drives connected to one card. If hdisk8, when it shows up, has more than the SCSI id different in its connection address, there could be a timing issue, where the first controller doesn't detect a drive and the second gets to pick it up.

2.) All of the advice so far has been under the assumption that there are only four drives in the 3310. I'm not familiar with the product, but could there be more drives, and a SCSI id conflict?

Rod Knowlton

IBM Certified Advanced Technical Expert pSeries and AIX 5L
CompTIA Linux+
CompTIA Security+

 
Rod, I assumed the two scsi cards attached to the sun disks were 1 in each host and we only got the lsdev from 1 server (although that wouldn't account for both SSA cards being in this host).
If not, the other sun scsi adapter cannot see any disks, 'cos they all show on 34-08.
Guess we need lsdev -Cc disk and lsdev -Cc adapter from each server to be sure.
 
one scsi in each host, one ssa in each host.

the host sadcora2 is ok:

[sadcora2/#]lspv
hdisk0 000c94cfbf73212d rootvg
hdisk1 000c94bfbf4e6182 rootvg
hdisk2 000c94cf7c312ebc IASlog
hdisk3 000c94bfafbd7d90 oralogvg
hdisk4 000c94bf8b6b451c backupvg
hdisk5 000c94bf9547fe1b DominoVG
hdisk6 000c94bfb062a75f IASvg
hdisk7 000c94bf8c76466a ora2datvg
[sadcora2/#]lsdev -Cc disk
hdisk0 Available 40-60-00-4,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 40-60-00-12,0 16 Bit LVD SCSI Disk Drive
hdisk2 Available 1A-08-L SSA Logical Disk Drive
hdisk3 Available 1A-08-L SSA Logical Disk Drive
hdisk4 Available 34-08-00-0,0 Other SCSI Disk Drive
hdisk5 Available 34-08-00-0,1 Other SCSI Disk Drive
hdisk6 Available 34-08-00-0,2 Other SCSI Disk Drive
hdisk7 Available 34-08-00-0,3 Other SCSI Disk Drive

but the other node is not ok:

[sadcora1/#]lspv
hdisk0 000c94bf000be016 rootvg
hdisk1 000c94bf9437ff1c rootvg
hdisk2 000c94cf7c312ebc IASlog
hdisk3 000c94bfafbd7d90 oralogvg
hdisk4 000c94bf8b6b451c backupvg
hdisk5 000c94bf9547fe1b DominoVG
hdisk6 000c94bfb062a75f IASvg
hdisk8 none None
[sadcora1/#]lsdev -Cc disk
hdisk0 Available 40-60-00-4,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 40-60-00-8,0 16 Bit LVD SCSI Disk Drive
hdisk2 Available 1A-08-L SSA Logical Disk Drive
hdisk3 Available 1A-08-L SSA Logical Disk Drive
hdisk4 Available 34-08-00-0,0 Other SCSI Disk Drive
hdisk5 Available 34-08-00-0,1 Other SCSI Disk Drive
hdisk6 Available 34-08-00-0,2 Other SCSI Disk Drive
hdisk7 Defined 34-08-00-0,3 Other SCSI Disk Drive
hdisk8 Available 34-08-00-0,3 Other SCSI Disk Drive

now the hacmp can not work correctly.

 
hdisk7 and hdisk8 are the same disk, the location is the same: 34-08-00-0,3 the 34-08 is the scsi card location and the 0,3 is the scsi address, so both hdisks are on the same adapter at the same address... they are the same disk.
This will mean both are configured in the ODM, with the same pvid, etc.
HA looks for hdisk7 in the odm and finds the device is defined, therefore can't access it.
My first posts apply, the disk is not reporting in consistantly on boot so AIX assumes it is a different disk.
Back up this vg - NOW.
Anything for either disk in the error report?
Check with IBM and Sun for device driver / firmware updates for both the adapter and the disks.
Try reseating the disk.
Try the disk in another slot (not familliar with the 3310 so don't know how many slots or hor the disks scsi addresses are set).
Check the jumpers on the drive match the other disks (check for delayed spin up, term power, etc).
Check the scsi cables are in good condition, no tight bends or kinks, etc.
Check the scsi cables are firmly attached.
Check the scsi termination.
For the cables and terminators, the only real way to check they are good it to replace them - same for the disk but if hdisks 4 -7 all suffer the same problems it is more likely a bus problem.
Check the scsi card is fully seated in the PCI slot and screwed down tight.
Good luck.

 
Yeah, stick it in the inittab and bob's your uncle....
Not the ideal answer though, is it?
 
So DukeSSD are you saying that fixing the problem so that things work properly is a bad thing? Sometimes a root cause is never determined and you just have to move on with a solution.

One change to the above commands I suggested. Only run the cfgmgr down the proper scsi bus,not the whole system.


Jim Hirschauer
 
No, you are right, as AIX 4.3.3 is out of defect support there is not likely to be any better option.
I suppose you could script a check to determine which actual disk configures as which hdisk and if not as expected then rmdev mkdev them back in to the right order. As the rootvg disks don't seem to be affected this may be the quickest and neatest solution.
 
yes , the problem will re-appear, but not every time.

both nodes' hdisk4-7 all suffer the same problems,

maybe drivers or firmware need to be updated.

 
thx for all replies.

where can i find the update firmware or driver?
 
Copy paste from "HACMP Systems Administration I: Planning and Implementation" Student Notebook:

Code:
During the AIX boot sequence, the configuration manager (cfgmgr) accesses all the shared disks (and all other disks and devices). Each time it accesses a physical volume at a particular hardware addess, it tries to determine if the physical volume is the same actual physical volume thar was last seen at the particular hardware address. With SCSI disks, it does this by attempting to read the physical volume's ID (PVID) from the disk. This operation fails if the disk is currently reserved to another node. Consequently, the configuration manager is not sure if the physical volume is the one it expects or is a different physical volume. In order to be safe, it assumes that it is a different physical volume and assigns it a temporary hdisk name. This temporary hdisk name is called a ghost disk.

[...]

[b]It is very important that if ghost disk occur, they be left in the AIX device configuration as their presence is necessary for the correct operation of the LVM when the volume group is ultimately online.[/b]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top