Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Corrupt data on new RAID array...I am worried 1

Status
Not open for further replies.

wahnula

Technical User
Jun 26, 2005
4,158
US
Hello,

I recently (3 days ago) changed out my RAID 5 controller...from onboard Sil 3114 to 3ware 9000 PCI because I have had too many occasions of dropped drives with the Sil, it would randomly drop a drive from the array. It was visible as an 'orphan' and the array was always rebuilt w/o data loss.

I am using (3) Raptors in a mobile RAID rack. OS is SBS 2003 Premium.

Today, with the new 3ware array, I went to open an Excel file and it was corrupt. Gone. I went to yesterday's backup, and the file was fine. Then found another corrupt file. Backup was fine, starting to worry now.

Something happened today to corrupt those files. In all my years (7) of running this small network I have NEVER had a corrupt Excel file, now I have two? Something's up.

Could the same thing that caused a drive to be dropped from an array cause data corruption? In other words, if there was a hardware problem, like a defective mobile rack, could it manifest itself as a dropped drive on the Sil controller and data corruption on the 3ware?

I'm scared boss, real scared. I'm hanging on to that backup like a glove. I also have a copy of everything made the day of the array change. Thousands and thousands of files...any ideas will be appreciated.

Tony

 
Did you re-build the array on the new board or merely swap the drives to the new board? If the latter I suspect that could be your problem.

Each RAID array manufacturer has their own proprietory method of putting the raid info on the disks. Unless the manual says you can use disks set up on one manufacturers board in another manufacturers system, then generally you can't.

Alternatively the new board could have a problem of some kind.
 
Did you re-build the array on the new board or merely swap the drives to the new board?

No, I completely rebuilt the array from formatted drives, then created a New Volume, converted and formatted the array. I then copied the data to the new array via Ghost clone.

I will call 3ware this morning.
 
Hmmm!
Maybe the old raid controller was not as much an issue. one bad harddrive may be dragging you down. so the new controller is just reacting to the bad drive in a different way then old one did.

Was the drive that was dropped the same drive every time???
Also, What is your voltages running in the mobile rack?
The best way to test is leave a multimeter hooked in and monitor the voltages as you run the system. I have a multimeter that records high/low over a period of time.

Can you get a Single drive with the same or larger capacity as your backup. At least so you can be operational while you troubleshoot and test for data corruption. I have done that before and I keep a drive just for that purpose.

You spent a LOT of money on that card so call their tech support to see if it is an issue with the card. doesn't hurt to test the drives and call on the mobile rack.

good luck
 
Was the drive that was dropped the same drive every time???

No, at first it was always #2, I ran diagnostics on it and it was fine, I still bought a replacement and returned it to WD for replacement. Then #3 dropped, then #1 so the problem was not drive-specific. I have run WD diags on every drive, passed.

Also, What is your voltages running in the mobile rack?

No idea, but it's got an Antec 550 PSU and (2) power connectors for the mobile rack, made by AMS. My next step would be to remove the drives from the rack and connect direct, to eliminate that as a cause.

Can you get a Single drive with the same or larger capacity as your backup.

Yes, I do. So you would recommend taking the array down now? My backup system makes that an easy task, I will omly need to re-map the clients. I think that is what you are saying.

You spent a LOT of money on that card so call their tech support to see if it is an issue with the card

Spent some time on the phone today w/ 3ware tech support, he said any errors in the mobile rack backplane would be logged in the 3ware log, they were not. There is a ftdisk Warning in the SBS Eventlog, Event ID 57, "The system failed to flush data to the transaction log. Corruption may occur."

His advice was to update driver & firmware, and send him the eventlogs for review.

Thanks for your help.

Tony
 
What I would do is run a single drive for now without the array. and let your clients do their normal work day. then in the off time run the array as a test mode. That way your clients are not affected by data loss or corruption.


Use a boot manager to load and use the arrayed system as a testing ground. or better yet disconnect the single drive so you will not corrupt or lose the days work on it.

this is important. when you are testing the array. stress and test to the max the whole system.
the best stress tester I have found is prime95. but it does not test or stress harddrive's.

I was kinda thinking maybe you are running into a problem unrelated to the array or any harddrive/and or controller issue. the stress testing may ferret out the problem
also the client with the corrupt excel file should be checked to rule out that computer as the source of the corrupt files.

Reading your post again....multiple drives dropping. My first step in that scenario is to replace powersupply to those drives and /or the whole unit. I assume there is good cooling to the mobile raid rack.

Now if you get data corruption with only one drive you can pretty much rule out the raid system as the primary fault.



 
the client with the corrupt excel file should be checked to rule out that computer as the source of the corrupt files.

This has to be it. I checked the client's Eventlog (my personal laptop, with the entire database synchronized on it) and found an Offline Files Error Id 5:

"A portion of the Offline Files cache has become corrupted. Restart the computer to clean up the cache."

Well done!
 
I would still check that mobile RAID rack. and I am glad you found that issue with the excel files
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top