Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

IBM BladeCenter Power Problems

Status
Not open for further replies.

mytelecoms

Vendor
Jul 18, 2005
179
0
0
GB
Hi folks,

There's a big star for anyone who can help with this one.

We have 12 IBM BladeCenter chassis. All are 8677 variants.

Each chassis is fully loaded with blades, modules and power supplies.

Each power domain (two PSUs) in each chassis is connected to a different distribution block in the cab. The distribution blocks in each cab are on different phases.

We have four cabs - 3 chassis per cab.

We're getting alerts on a regular basis of a one or two second interval where power appears to have been cut for a power supply in one of the domains. The messages are as follows:

Power modules are nonredundant in domain x

This is happening on several chassis which are in different cabs, and there is no clear pattern to it.

Having pushed and pushed the people in our data suite, they tell me that there is nothing wrong with the supply. I am less convinced, but was willing to look elsewhere. So, I upgraded the MM firmware on every chassis to 1.21i, which says it corrects certain power management alerts.

However, since doing the upgrades, I'm still seeing the alerts.

What's more, in addition to this, we've had two power supplies fail in two weeks - both in the same cab and were less the 12 months old.

Now, even if the PSU failures were coincidence, am I right in thinking I would only get these messages if the supply did genuinly fail for the period of time. Or, is there a known issue where BladeCenters report crazy details about their PSUs?

Any help appreciated, as always!!

Cheers

Chris
 
In your power domains do you have it set for redundant power with performance impact?
 
Good question. On the first eight we have redundant without performance impact. On two, we have with redundant power with performance impact. Most of those machines report the problem - regardless of the setting. Two more are set to non-redundant and don't have a problem.
 
Which Power supplies do you have?

There were the 1200 watt? and the 2000 watt.

We had to swap our 1200s out for the 2000 watt.

Similar things were happening to us.
 
We changed all of our PSUs to 2000W less than 12 months ago, because we were buying the newer SCSI blades which pull more power.

Thanks for the suggestion though.
 
This is the kind of issue that needs explanation. This is a known limitation of the 8677's. The original blades that were designed for the 8677 chassis did not require as much power as the newer blades. As the power requirements increased, IBM developed higher capacity power supplies. Eventually the power supplies topped out at 2KW a piece. The blades however kept requiring more power depending on the build. What has essentially happened is the power supplies in a fully populated 8677 do not have enough power to maintain the power on a single power supply per domain. The result is possible throttling.

It would help to know how your chassis is populated.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top