Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

No failover after a raid1 hard disk crash

Status
Not open for further replies.

Bossnet

IS-IT--Management
Sep 18, 2006
3
LU
Hi,

My configuration is: W2003 sp1 ent. with Exchange 2003 sp2 in a 2 nodes cluster (active-passive).

Our hosting environment stopped responding after the crash of a hard disk, in a mirroring configuration. What's weird is that the cluster hasn't failover to the second node (the server was still responding to tcp/ip requests) and MOM didn't noticed the failure. In that kind of setup (raid1 + clustering), we were hoping not to have that kind of problem. But it happened. Is there any solution that will solve that?

Thanks in advance for your help
 
If the host still responds to ping and the cluster service on the crashed node still responds to the cluster service on the passive node then unfornitually the cluster service has no way of knowing that the system crashed.

If you setup the servers to reboot after a crash that "should" take care of it as the crashed node will start a reboot and the passive node should then know that it's down.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
I'm not sure what the issue is? Is the issue that the server didn't fail over to node2? Not sure that it would, in a RAID 1 config. If one drive failed then the disk resource would not go offline, therefore the cluster would never know there was a problem. Which is what should happen with a RAID 1 configuration, you are protecting yourself from such HW failures. Now it is knowm that if a shared cluster disk is a RAID 5 (SW RAID) config, the drives will not fail-over to the passive node until the RAID issue is resolved. But, I would say in your case that is not the issue since you are using RAID 1.
 
Ok, thank you very much both of you. I think that the cluster is not responsible. This should be the raid controller that is. It should have noticed the failure and switch to the healthy hard drive. The hardware must be monitored within MOM also. A ticket has been opened at HP and I’m waiting for the answer.

I will post the reply and causes when I received it, for the purpose of others that could have this issue
Regards
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top