Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Hipath 4000 Unix Reboots

Status
Not open for further replies.

kevin906

MIS
Aug 4, 2006
167
US
Every few weeks at random we are seeing Unixware reboots at this site against these areas:

WARNING: msgcnt 1 vxfs: mesg 020: vx_logerr - /dev/dsk/c0b0t0d0sb file system log error 5
WARNING: msgcnt 2 vxfs: mesg 031: vx_disable - /dev/dsk/c0b0t0d0sb file system disabled
WARNING: msgcnt 3 vxfs: mesg 020: vx_logerr - /dev/dsk/c0b0t0d0s5 file system log error 5
WARNING: msgcnt 4 vxfs: mesg 031: vx_disable - /dev/dsk/c0b0t0d0s5 file system disabled
WARNING: msgcnt 5 vxfs: mesg 020: vx_logerr - /dev/root file system log error 5
WARNING: msgcnt 6 vxfs: mesg 031: vx_disable - /dev/root file system disabled
WARNING: msgcnt 7 vxfs: mesg 057: vx_esum_bad - /dev/dsk/c0b0t0d0s5 file system extent allocation unit summary number 0 marked bad
WARNING: msgcnt 8 vxfs: mesg 057: vx_esum_bad - /dev/dsk/c0b0t0d0s5 file system extent allocation unit summary number 1 marked bad
WARNING: msgcnt 9 vxfs: mesg 057: vx_esum_bad - /dev/dsk/c0b0t0d0s5 file system extent allocation unit summary number 3 marked bad
WARNING: msgcnt 10 vxfs: mesg 057: vx_esum_bad - /dev/dsk/c0b0t0d0sb file system extent allocation unit summary number 1 marked bad
Aug 10 04:11:59 aasd[1028]: Checkpointing.
Aug 10 04:11:59 aasd[2215]: Unable to change to checkpoint diretory /usr/spool/aas/cp: I/O error
ADP: RMX requested Unix shutdown
ADP: RMX requested (soft) Unix shutdown
PANIC: wdcsh: watchdog panic

The disk appears to be fine. Capacity shows to be within reasonable limits. Should we be changing the processor or doe we have something else going on here?

 
I would reinstall the Unix, it's 2 hours work and won't affect call processing unless you have a HPPC using CAP through the SL100 address. Make a backup of your config first with HBR, and kick off a new installation. Then make sure you have the latest HF applied.

Or if you can't be bothered with that, you need at the least to make sure you are on the latest HF.

dis-aps:,psgl,y*; will tell you the main Unix KV, from a telnet session ls \var\hf will tell you what HF you have already applied

It does look a bit "disky" though doesn't it. Even then, if you wanted to change the HD, you would still have to reinstall the UW.

 
Actually the Uware has been reloaded on this site twice in the last year for the same issue. Dipas_batch got corrupted twice to the point of "Broken Pipe" for any access attempt and of course customer could not use Assistant in that state. I can't get into specifics but we can't get the hotfixes due to the relationship we have with Siemens. The last time Unixware was reloaded was only 2 months ago.
I keep wanting to think hardware due to the watchdog timeouts but then again perhaps its a runaway process firing the reboot via watchdog timeouts.

thanks for any replies. You must have some TAC background since your answers are pretty much to the point...
 
Have to tried to do an 'fsck' (or whatever it's called in that flavor of unixware)from unix on the drive? Maybe there are just a few corrupt sectors that need to be locked out. Of course if they are under any important files you may still be stuck reinstalling the OS....

If I was stuck having to do the OS anyway I'd probably do the drive too just to save me needing to do it again... although that may not be economical if Siemens is still in the practice of including a free colonoscopy in the cost of the
(what should be no more than $150) hard drive...

 
Assuming hardware error when you see watchdog is I think not unreasonable, but the watchdog error isn't the first thing that happens, it's the last. The file system does look unstable.

If you don't have a service agreement with Siemens, you may struggle for parts.

At least you've done the first hurdle of reinstalling the UW, but if you can't get the HF, are you even on the latest complete version of UW ? Or is that out of date also ?

If you have a duplex processor shelf I would swap the ADP processor with the BP, assuming you are up to speed on the processor cards and their configuration (i.e, jumper 6 if it's DPC5).

I would try and swap that HD out too, if you can source one suitable. You can do this with no loss of service by swapping the MO out for the HD and using INIT and DDRSM to duplicate the disk from the HD on ID 1 to the new HD on id 6, then swap 6 to 1 and put the MO back.

If there was a service agreement, I would upgrade the software, apply latest HF, swap the processor and memory, the disk carrier, and the HD.

With no service agreement, you'll have to do what you can.

Swapping the ADP/BP processors would be a logical (and free) next step.

 
Black Box resale may prove to be a source for the drive.
 
Thanks for all replies, will proceed and advise as to what is found with next steps.
 
Kevin what is the exact software level of the switch ?

dis-aps:,psgl,y*;

and

./opt/bin/getosversion
ls /var/hf

from the UW.
 
DIS-APS:,PSGL,Y*;
H500: AMO APS STARTED
ADINIT STARTED
PROGRAM SYSTEM : Y0-EM0YC
VERSION NUMBER : 10
CORRECTION VERSION NUMBER : 001
PART NUMBER : P30252N4505B00005
PROGRAM SYSTEM WITH CODE SUBSYSTEMS
INTERFACE VERSION:
PROGRAM SYSTEM DOES NOT CONTAIN ANY INTERFACE VERSIONS

DIR SUBSYSTEM | | OMF SUBSYSTEM
-----------------------+-+-----------------------
ZMITSC00.Y0-EM0.10.001 |*|ZMITSC00.Y7-PMT.10.001

PROGRAM SYSTEM : Y7-PMTYT
VERSION NUMBER : 10
CORRECTION VERSION NUMBER : 001
PART NUMBER : P30252N4500M00140
PROGRAM SYSTEM WITH TEXT SUBSYSTEMS
INTERFACE VERSION:
PROGRAM SYSTEM DOES NOT CONTAIN ANY INTERFACE VERSIONS

DIR SUBSYSTEM | | OMF SUBSYSTEM
-----------------------+-+-----------------------
ZMITSC00.Y7-PMT.10.001 | |ZMITSC00.Y7-PMT.10.001

ADINIT COMPLETED
STATUS = H'0000
AMO-APS -111 SOFTWARE LOAD UPGRADE
DISPLAY COMPLETED;



# ./opt/bin/getosversion


PKGINST: UNIX OS
NAME: SCO UnixWare Operating System Rel. 8.0.0
CATEGORY: OS
ARCH: i386
PLATFORM: Unity A&S
PSTAMP: 0602220900OpenUNIX
INSTDATE: Jun 23 2009 09:09 PM
YAPS: M0-APL.31 / CV 140-00
------------------------------------------------
RMX YAPS: HiPath4000V30 SA05 RL05 (P30252N4505B00005)
#

No Hot Fix directory, software is behind

 
Wow.

That's some seriously old software.

Although a software fault seems unlikely, given the errors, who can say when it's a KV as old as that ? Can't quite remember but think that might have been the first UV3.0 release.

You could do with upgrading the Unix to the lastest (or at least, later) UV3.0 KV.

Perhaps someone friendly on the forum could copy their SCR area for you, it would install OK. The RMX wouldn't be affected and licensing is not an issue as your license is applied against RMX features, you are only applying bug fixes to your UW.

Ideally though, you'd do the RMX and UW at the same time.

It won't break if you don't.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top