Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

IBM xSeries Bad Stripe 2

Status
Not open for further replies.

no2broady

IS-IT--Management
May 11, 2004
38
IE
Hi,

I have an IBM xSeries 226 server, it looks to have a bad stripe on a disk. It's configured for RAID 5, using 3 disks. The system is hanging usually over night. A system reboot will bring the box back online but I am having to do this more frequently.

The system is running Serveraid Manager 7.10.18, there are no errors in the log of ServeRaid Manager. I am getting an event id 215 twice a day, which you can see in the attachment.

Could anyone shine a light on a possible solution?

No2broady
 
Sorry, I should have said I cannot identify the faulty disk.

No2broady
 
Well you are downlevel on the code if you are using 7.10.18, the latest is 7.12.14 Also look here about bad stripes.
But this boils down to "No, there is no procedure or tool available for clearing or repairing a bad stripe while maintaining the existing array. In the instances when a bad stripe has occurred, the data contained within that stripe is incomplete, invalid, or inconsistent between the data and parity and a Bad Stripe Table entry is created to block that stripe to prevent hidden data corruption."

SO, any time a bad stripe is encountered, the array must be deleted, and restored from a backup. Also, if you have IBM maintenance you can create a ticket, and send them the Serveraid log, and they can parse it for any errors, and that can point to a failing drive, but it just may be related to downlevel code and drivers.
 
Man you just beat me to the punch. This could be caused by A) never running an array synchronization, B) faulty disk drive C) out of date driver or out of date controller firmware.

What IBM says:

I would still use the bootable CD ( to check the status of all the drives and maybe even test each one to see if anything needs to be replaced. You want to make sure all the disks are okay before you nuke/reload.

Plus you should check and update the firmware on the controller as well as the driver for the operating system. Probably do the firmware after you wipe out the array - nothing to worry about at that point. Have the latest driver ready for O.S. installation.

Wow - I never had one of those happen to me. Thank goodness.
 
Thanks for your reply.

I read the part about deleting the array, the problem I have is I don't have a spare disk and also a way of telling which disk is faulty to be able to swap bad for good. I don't have IBM maintenance on that system so that option isn't a go unfortunately.

If I do delete the array and rebuild from a back up, will the bad stripe be ignored?

No2broady
 
To be clear, deleting the array and recreating will fix everything, provided you don't have a bad disk.

Follow my last post to boot to the serveraid CD and then check each disk. No sense in trying to fix the array issue with an o.s. reload if bad hard drives are still present.

You should always have a spare drive in stock <<< optimally
 
Thanks for the info, I am currently burning a copy of the serveraid CD. You'll have to forgive my ignorance, not run this before. So I can basically boot into that CD and run a diagnostic routine from there? Does this update the firmware in the process?
 
Another thing was I did run CHKDSK and the results are below, I haven't run this in fix mode, the process as you can see has found errors but haven't run these yet today as the system is still in use. Could this do anything to work around the bad stripr or am I being wishful!?!


C:\>chkdsk
The type of the file system is NTFS.

WARNING! F parameter not specified.
Running CHKDSK in read-only mode.

CHKDSK is verifying files (stage 1 of 3)...
259584 file records processed.
File verification completed.
2998 large file records processed.
0 bad file records processed.
0 EA records processed.
4 reparse records processed.
CHKDSK is verifying indexes (stage 2 of 3)...
895133 index entries processed.
Index verification completed.
5 unindexed files processed.
CHKDSK is verifying security descriptors (stage 3 of 3)...
259584 security descriptors processed.
Security descriptor verification completed.
21287 data files processed.
CHKDSK is verifying Usn Journal...
537530128 USN bytes processed.
Usn Journal verification completed.
Windows found problems with the file system.
Run CHKDSK with the /F (fix) option to correct these.

286746607 KB total disk space.
131644778 KB in 233928 files.
92288 KB in 21288 indexes.
0 KB in bad sectors.
922678 KB in use by the system.
65536 KB occupied by the log file.
154086862 KB available on disk.

512 bytes in each allocation unit.
573493215 total allocation units on disk.
308173725 allocation units available on disk.

C:\>
 
It is a bootable CD. It should be able to upgrade the controller BIOS and firmware. What controller do you have??


You'll have to look at the documentation, but you should be able to see the status of logical drives, arrays and each individual disks (online, offline, etc.) and be able to test each drive.


Updating individual hard drive firmware (this is just an example of what you need - it may not be the latest for your server. I leave that to you. But it's the type of CD you need.
 
Hi goombawaho,

I have a Serveraid-6i controller. I am just going through a step by step for how I'm going to attack this.

Firstly going by the IBM guide I have to update the device driver via the OS, then boot into the Serveraid firmware/diagnostics CD and follow the wizard prompts.

I have one question, do I still have to update the hard drive firmware too?

I am trying to get to a point where I can get the system to tell me exactly which disk needs replacing and obviously not follow the nuke/reload road!

Thanks again for your help.
 
I guess the hard drive firwmare update is optional at this point. I would just tell you to do it AFTER you have nuked the array and BEFORE you recreate it. At that point you have nothing to lose.

And YES, I am a believer in checking for HDD firmware updates. But you can read the TXT file that is included that will tell you what is fixed. It might be critical, it might be minor and it might not even affect any of the drives you have.

How it works is that these update CDs can update multiple hard drives. IBM sources multiple vendors for hard drives. You may have an IBM drive model XYZ123 (1TB SATA 10,0000 rpm - whatever), but it could be sourced from multiple vendors. So you might have 2 Seagates and 1 Western Digital in your server. Your particular hard drives might not need updating and then again they might.

I've seen new servers with 6 hard drives purchased at the same time containing 3 different brands of drives. You might think they were all exactly the same. Not.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top