Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Slow boot -- Disk error

Status
Not open for further replies.

wasserl

Programmer
Jul 2, 2001
15
0
0
US
Hello everyone...

I am by no means an AIX sys admin but have been trying to fulfill the duties since we no longer have one...

Our system sort of crashed a few weeks ago. Ever since then it boots VERY SLOW. I would say it takes 45 mintues to an hour to get to a login prompt. I have tried everything I could think to do, even restoring our most recent backup (still slow). When I look at the error log I get a: A668F553 error (disk operation error).

If anyone has ANY ideas, please let me know what to try.

Thanks,
wasserl
 
What kind of system do u have and are u using a RAID?
please send more info about your disks
 
It is AIX 4.1.5 with vxWorks. I'm not really sure what you mean by a RAID. It has one hard disk and is nfs mounted to our solaris machine. The hard disk is really old (SCSI-I), about 3 GB I think.

I have tried the same thing on another disk of the same sort though, so I am pretty sure it isn't a hardware problem.

 
check the error report and take note of any errors that have occured since the "crash".

# errpt -a | pg

also, slow booting has a vast sea of problems associated with it, but more times than not, I've see it because of a network problem. Like the machine has an incorrect hostname, IP address, or default GW. Check all of those out.

when it is booting (hanging), is there anything on the console? do you have a LED that shows the 3 digit codes on the front of the machine? if so, what is the number when it is hung in the boot process.

 
The error that happens when it is slow is A668F553 (Disk Operation Error). I've checked all the network configurations and they are the same as they have always been, so they should be right.

The original screen where the hang begins looks like:

---------------------------------------------------------------------------------
MPU Clock Speed = 300Mhz
BUS Clock Speed = 67Mhz

Reset Vector Location : ROM Bank B
Mezzanine Configuration : Single-MPU
Current 60X-Bus Master : MPU
Idle MPU(s) : NONE

System Memory: 64MB, ECC Enabled (ECC-Memory Detected)
L2Cache: 512KB

Self Test / Boots about to Begin… Press <BREAK> at anytime to abort ALL

AutoBoot about to Begin … Press <ESC> to Bypass, <SPC> to Continue

Booting from: NCR53C825, Controller 0, Drive 0
Device Name: /pci@80000000/pci1000,3@c,0/harddisk@0,0

Loading Operating System

PL Loaded at: $03c49000
Residual-Data Located at: $03F2F000

---------------------------------------------------------------------------------


It then goes to the AIX &quot;marble screen&quot; with a small gray status box in the top right corner. Everything seems normal. Then it continues to the login prompt, which afterwards takes about another half an hour to boot.

 
Oh, and we also checked to see if it could problems nfs mounting, so we removed all of those lines from the filesystems file.
 
call the ibm man in -- sick disk.... Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.
 
Hi Wasserl,,

If possible could you let me know what kind of hardware you are using.. and looking at your error description it seems that your hdisk0 is gone bad or some block has been corrupted..
I would suggest you to run a diagnostics in maintenance mode..

hope this will help u


Dilip
 
Well....

We've tried the same system image on another disk AND another machine using all combinations. The same thing happens. We also had our hardware guys do tests on the drive. It seems to be fine.

ANother wierd thing.... I tried to unmount different file systems so that I could do an fsck on them. I tried umount all and every one said something like can't unmount, file system busy. So, I tried to run the fsck on a mounted drive. It said, &quot;won't be accurate&quot;, but it did say something about bad inodes and bad blocks. What does this mean and what can I do about it?

We have a power pc with a SCSI-1 hard drive. We know it is old and outdated, but we are also currently having a problem with our SUN which we tried to upgrade to an 18gB drive and we are getting SCSI parity errors (YES we need lots of help -- mostly we need a sys admin).

Finally, how do I run diagnostics in maintenance mode??

Thanks for all the help and suggestions thus far...


 
Can you show us the output of [tt]errpt[/tt] command?
I hope it works...
Unix was made by and for smart people.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top