Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

hdisk0 failure

Status
Not open for further replies.

hellsbells

Technical User
Jan 9, 2003
32
GB
I am running AIX 5.1 on a pseries 640 and have a hdisk0 disk error in my error reports. I am not confident that I mirrored the disks on this box.

There are four internal disks (hdisk0, hdisk1, hdisk2 and hdisk3). The machine seems to stop if you ask something it doesn't like.

I can swap the hdisk0 out with another disk, but I want to unmirror it from the rootvg and then remove it from the rootvg. But first I need to ensure it is properly mirrored. I guess the machine would have stopped altogether if it was not mirrored, but if I ask for anything from the /tmp filesystem the request fails - ie snap -gfLC results in /usr/sbin/snap[3562]: /tmp/sh38882.13: cannot create
Similarly if I run lsvg rootvg I get the following : 0516-070 : LVM system call found an unaccountable
internal error.

I can get the physical disk info on hdisk1 - this is as follows:-

# lspv hdisk1
PHYSICAL VOLUME: hdisk1 VOLUME GROUP: rootvg
PV IDENTIFIER: 0046947af2292c22 VG IDENTIFIER 0046947a00004c00000000f6f
09d9c44
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 32 megabyte(s) LOGICAL VOLUMES: 1
TOTAL PPs: 542 (17344 megabytes) VG DESCRIPTORS: 1
FREE PPs: 538 (17216 megabytes) HOT SPARE: no
USED PPs: 4 (128 megabytes)
FREE DISTRIBUTION: 109..108..104..108..109
USED DISTRIBUTION: 00..00..04..00..00

However if I try the same on the other disks I get the following :-

# lsvg hdisk3
0516-306 : Unable to find volume group hdisk3 in the Device
Configuration Database.
# lsvg hdisk0
0516-306 : Unable to find volume group hdisk0 in the Device
Configuration Database.
# lsvg hdisk2
0516-306 : Unable to find volume group hdisk2 in the Device
Configuration Database.

Can anyone tell me what I should do next please?

Many thanks
 
Hi,

Post up an lspv -l hdisk1

Cheers


PSD
IBM Certified Specialist - AIX V4.3 Systems Support
IBM Certified Specialist - AIX V4 HACMP
 
It doesn't look good..........

# lspv -l hdisk1
0516-070 lspv: LVM system call found an unaccountable
internal error.
 
lspv
lsvg -l rootvg
lsvg -l rootvg|awk '( $0 !~ /LP|:/){print $1}'|while read LV
do
lslv -l $LV
done
 
# lspv
hdisk0 0046947a0714b29f rootvg
hdisk1 0046947af2292c22 rootvg
hdisk2 0046947af2294e6d rootvg
hdisk3 0046947af229699d rootvg

# lsvg -l rootvg
0516-070 : LVM system call found an unaccountable
internal error.
 
I am really not sure what is going on with this box.

One minute if I try a lspv on hdisk0 or hdisk3 I get an error back but the next I get the stats I would expect :-

# lspv hdisk3
PHYSICAL VOLUME: hdisk3 VOLUME GROUP: rootvg
PV IDENTIFIER: 0046947af229699d VG IDENTIFIER 0046947a00004c00000000f6f
09d9c44
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 32 megabyte(s) LOGICAL VOLUMES: 2
TOTAL PPs: 542 (17344 megabytes) VG DESCRIPTORS: 1
FREE PPs: 421 (13472 megabytes) HOT SPARE: no
USED PPs: 121 (3872 megabytes)
FREE DISTRIBUTION: 109..95..00..108..109
USED DISTRIBUTION: 00..13..108..00..00
# lspv hdisk2
0516-070 lspv: LVM system call found an unaccountable
internal error.
PHYSICAL VOLUME: hdisk2 VOLUME GROUP: rootvg
PV IDENTIFIER: 0046947af2294e6d VG IDENTIFIER 0046947a00004c00000000f6f
09d9c44
PV STATE: ???????
STALE PARTITIONS: ??????? ALLOCATABLE: ???????
PP SIZE: ??????? LOGICAL VOLUMES: ???????
TOTAL PPs: ??????? VG DESCRIPTORS: ???????
FREE PPs: ??????? HOT SPARE: ???????
USED PPs: ???????
FREE DISTRIBUTION: ???????
USED DISTRIBUTION: ???????
# lspv hdisk0
PHYSICAL VOLUME: hdisk0 VOLUME GROUP: rootvg
PV IDENTIFIER: 0046947a0714b29f VG IDENTIFIER 0046947a00004c00000000f6f
09d9c44
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 32 megabyte(s) LOGICAL VOLUMES: 9
TOTAL PPs: 542 (17344 megabytes) VG DESCRIPTORS: 1
FREE PPs: 0 (0 megabytes) HOT SPARE: no
USED PPs: 542 (17344 megabytes)
FREE DISTRIBUTION: 00..00..00..00..00
USED DISTRIBUTION: 109..108..108..108..109
# lspv hdisk1
0516-070 lspv: LVM system call found an unaccountable
internal error.
PHYSICAL VOLUME: hdisk1 VOLUME GROUP: rootvg
PV IDENTIFIER: 0046947af2292c22 VG IDENTIFIER 0046947a00004c00000000f6f
09d9c44
PV STATE: ???????
STALE PARTITIONS: ??????? ALLOCATABLE: ???????
PP SIZE: ??????? LOGICAL VOLUMES: ???????
TOTAL PPs: ??????? VG DESCRIPTORS: ???????
FREE PPs: ??????? HOT SPARE: ???????
USED PPs: ???????
FREE DISTRIBUTION: ???????
USED DISTRIBUTION: ???????
# lspv hdisk2
PHYSICAL VOLUME: hdisk2 VOLUME GROUP: rootvg
PV IDENTIFIER: 0046947af2294e6d VG IDENTIFIER 0046947a00004c00000000f6f
09d9c44
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 32 megabyte(s) LOGICAL VOLUMES: 1
TOTAL PPs: 542 (17344 megabytes) VG DESCRIPTORS: 1
FREE PPs: 538 (17216 megabytes) HOT SPARE: no
USED PPs: 4 (128 megabytes)
FREE DISTRIBUTION: 109..108..104..108..109
USED DISTRIBUTION: 00..00..04..00..00
# lspv hdisk3
0516-070 lspv: LVM system call found an unaccountable
internal error.
PHYSICAL VOLUME: hdisk3 VOLUME GROUP: rootvg
PV IDENTIFIER: 0046947af229699d VG IDENTIFIER 0046947a00004c00000000f6f
09d9c44
PV STATE: ???????
STALE PARTITIONS: ??????? ALLOCATABLE: ???????
PP SIZE: ??????? LOGICAL VOLUMES: ???????
TOTAL PPs: ??????? VG DESCRIPTORS: ???????
FREE PPs: ??????? HOT SPARE: ???????
USED PPs: ???????
FREE DISTRIBUTION: ???????
USED DISTRIBUTION: ???????
 
Do you remember how those disk were mirrored?
If so execute a bosboot on a disk of a good pair and then boot from it
 
No I don't and I cannot find any documentation either..........

I cannot understand why one minute I can see the lspv output for hdisk0 and hdisk2 and I cannot see the lspv output for hdisk1 and hdisk3, but then later on it swaps over and I cannot see the output of hdisk0 and hdisk2 but can see the output for hdisk1 and hdisk3. Do you have any ideas?

Also do I not need to unmirror before I create a new boot image?

Many thanks for your help so far.
 
I suggest a "synclvodm rootvg" to rebuild the ODM information for rootvg from the actual volumes.

If the ODM is fine though, and the disk is bad, then that will create some nasty problems in the ODM.

AIX is usually fairly good about handling hardware faults though, so I'm more likely to suspect that your ODM is corrupted. Might be a good idea to check out other devices to see if they are clobbered as well.
 
OK ... "bosboot -ad /dev/hdisk1" (So your system should boot from hdisk1)
Now ... it looks like your rootvg has been created along hdisk0 and hdisk2 then mirrored onto hdisk1 and hdisk3 or something else.
Try to repeat to or more times the command I gave you.
If it fails there's a way to understand which lvs are in rootvg
odmget -q "name='rootvg'" CuDep|awk -F'"' '/dependency/{ print $2}'
 
sbix

When I try the bosboot I get the following:

# bosboot -ad /dev/hdisk1
0516-070 lslv: LVM system call found an unaccountable
internal error.

0301-168 bosboot: The current boot logical volume, /dev/hd5,
does not exist on /dev/hdisk1.
# lspv hdisk1
PHYSICAL VOLUME: hdisk1 VOLUME GROUP: rootvg
PV IDENTIFIER: 0046947af2292c22 VG IDENTIFIER 0046947a00004c00000000f6f
09d9c44
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 32 megabyte(s) LOGICAL VOLUMES: 1
TOTAL PPs: 542 (17344 megabytes) VG DESCRIPTORS: 1
FREE PPs: 538 (17216 megabytes) HOT SPARE: no
USED PPs: 4 (128 megabytes)
FREE DISTRIBUTION: 109..108..104..108..109
USED DISTRIBUTION: 00..00..04..00..00


I have also tried the odmget command and it returns nothing.

Is it a good idea to try the synclvodm rootvg now as Chapter11 suggests?

Many thanks
 
????
You haven't anything as result of the odmget ???
Just to ... try "odmget CuDep |grep rootvg" ... if you haven't any line .... probably you have a problem also in ODM ... yes ... I would try the synclvodm
 
This is weird - last night it didn't work when I ran this command but this morning it is. At the moment I can see hdisk0 but not hdisk1 when I run lspv.

The output of the odm commands are:

# odmget CuDep | grep rootvg
name = "rootvg"
name = "rootvg"
name = "rootvg"
name = "rootvg"
name = "rootvg"
name = "rootvg"
name = "rootvg"
name = "rootvg"
name = "rootvg"
name = "rootvg"
name = "rootvg"
name = "rootvg"

# odmget -q "name='rootvg'" CuDep|awk -F'"' '/dependency/{ print $2}'
hd5
hd6
hd8
hd4
hd2
hd9var
hd3
hd1
hd10opt
paging00
paging01
paging02

Cheers
 
Well ... looks like your ODM works today.
Retry:
lsvg -l rootvg|awk '( $0 !~ /LP|:/){print $1}'|while read LV
do
lslv -l $LV
done
 
That one still doesn't work !

# lsvg -l rootvg|awk '( $0 !~ /LP|:/){print $1}'|while read LV^Jdo^J lslv -l>
0516-070 : LVM system call found an unaccountable
internal error.
 
Actually now that I think about it:

look at an errpt and see if you are having SCSI bus issues.

an unterminated SCSI bus can cause unpredictable behavior of devices on the bus, such as sometimes working, sometimes not.
 
The output from the synclvodm is as follows:-


# synclvodm -Pv rootvg

0516-070 : LVM system call found an unaccountable

internal error


My errpt is showing the following errors:

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
A668F553 0220130304 P H hdisk0 DISK OPERATION ERROR
A668F553 0220050304 P H hdisk0 DISK OPERATION ERROR
A39F8A49 0220050304 T S syserrlg ERROR LOGGING BUFFER OVERFLOW
A668F553 0219202104 P H hdisk0 DISK OPERATION ERROR
A39F8A49 0219202104 T S syserrlg ERROR LOGGING BUFFER OVERFLOW
A668F553 0219202104 P H hdisk0 DISK OPERATION ERROR
A39F8A49 0219202104 T S syserrlg ERROR LOGGING BUFFER OVERFLOW
D1E21BA3 0219210604 I S errdemon LOG FILE EXPANDED TO REQUESTED SIZE
613E5F38 0219201604 P H LVDD I/O ERROR DETECTED BY LVM
21F54B38 0219201604 P H hdisk0 DISK OPERATION ERROR
613E5F38 0219201604 P H LVDD I/O ERROR DETECTED BY LVM
3CFF4028 0219201604 U H hdisk0 UNDETERMINED ERROR
613E5F38 0219201604 P H LVDD I/O ERROR DETECTED BY LVM
A668F553 0219201604 P H hdisk0 DISK OPERATION ERROR
613E5F38 0219201604 P H LVDD I/O ERROR DETECTED BY LVM

 
It's time for a nice restore from sysbackup
 
yeah, that's pretty whack

some additional thoughts:

check to see if /tmp is full...

something that you might consider doing if getting desperate: physically remove hdisk0, then boot from cd, get to a shell, and check the status of rootvg there. particularly check to see if all of the LVs on the remaining copy are current. If any are stale, the system is toast.

the exception would be /tmp, you could recreate that (from maintenance mode) without a problem.

If everything is ok, you should also be able to perform the LVM work necessary to eliminate artifacts from hdisk0.

It's nasty work, but not beyond reason if you're familiar with operating at that level.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top