Here's my problem: All too frequently, the temperature in our computer room gets too hot. Today it got up to 93 F (33 C) before we knew the air conditioning had failed. This has happened several times, and each time we have lost one or two SSA disks two weeks after the overheating.
My HP system just goes into "hibernation" when the temperature gets too high. The disks stay spinning, but you can't get to the system and anyone who is logged in gets kicked out (that's how I discovered today's overheating).
Is there some way I can add to errpt a condition where if the internal temperature of the SSA drawers or the server itself gets above a certain point, an error is logged? I have the commands to extract the internal temp of the server and the SSA drawers and I have a script that warns me by email when there is a change in the number of errors in the error report.
Is there some way I can automatically get the system to shut down gracefully if the temperature remains above a certain level (for those times at night when nobody is here to shut the systems down)?
I am wondering if there is something with powerfail I can use to do this? (I would check the man pages and the rc.powerfail script, but my systems are currently down!)
A real monitoring system would be best, but I don't think management will want to spend the money.
Any suggestions/help is most appreciated.
My HP system just goes into "hibernation" when the temperature gets too high. The disks stay spinning, but you can't get to the system and anyone who is logged in gets kicked out (that's how I discovered today's overheating).
Is there some way I can add to errpt a condition where if the internal temperature of the SSA drawers or the server itself gets above a certain point, an error is logged? I have the commands to extract the internal temp of the server and the SSA drawers and I have a script that warns me by email when there is a change in the number of errors in the error report.
Is there some way I can automatically get the system to shut down gracefully if the temperature remains above a certain level (for those times at night when nobody is here to shut the systems down)?
I am wondering if there is something with powerfail I can use to do this? (I would check the man pages and the rc.powerfail script, but my systems are currently down!)
A real monitoring system would be best, but I don't think management will want to spend the money.
Any suggestions/help is most appreciated.