How to know when a disk subsystem has stopped responding

darkstar · Sep 10, 1999

Greetings, 
We had a case where a server with external disks had a bad problem. Somehow the power cord for the disks got pulled out and the disk stopped responding (of course). 
Oddly enough, the kernel kept running, at least for the time being, and the box responded to pings, snmp traffic, etc. 
Our mgmt. software never knew there was anything amiss with the box. 
Now that the post mortum is in, managers want to know if we can know when the disk stops responding on any given system. 
This is a dilemma. Any shell program that runs periodically will not suffice -- it won't be able to read the script, and even if it does run (as in constantly memory resident) what would a script do? In all probability it would pend waiting for disk I/O and never respond. 
Has anyone ever worked out a problem like this before?

slars · Sep 10, 1999

Darkstar, 
 
I have a few questions for you. Obviously, the external drives are the ones that lost power. Typically, a machine will have at least one internal drive where you load the O/S. Is this not the case with your server? 
 
If you do have an internal drive, you could easily write a script that lives on a file system that physically resides on your internal drive that monitors the status of your external drives. 
 
Also, there are a myriad of 3rd party products that monitor everything from drive status to network traffic, cpu utilization, etc. that could run on a separate machine to monitor this machine. 
 
slars

darkstar · Sep 10, 1999

In this case there are no internal disks. And we are indeed using a 3rd party product (ITO) to monitor the system. All of ITO's processes that check the wellness of the machine locked up as well, apparently pending on I/O.

OpenSys · Nov 8, 1999

Do you have another host to utilise? If so, why not have a cron job that checks the Disk Sub-System if all is well it writes a check-point off elsewhere(on another host), at pre-determined periods of time on the 2nd host a cron(unix)/AT(NT) job checks for a change in that file, if the file hasn't changed over that period or over a pre-determined number of time periods, there is a rule to highlight this event via email or someother form of notification. Its a little cumbersome but as a minimum this takes the hang notification away from the host that is potentially going to hang.

MikeLacey · Nov 29, 1999

You might want to think about monitoring /var/adm/syslog/syslog.log with something that mails you when a bad thing happens. 
 
You could base this on something that reads the output of 
 
tail -f /var/adm/syslog/syslog.log 
 
Mike 
Mike Lacey <a href=mailto:Mike_Lacey@Cargill.Com>Mike_Lacey@Cargill.Com</a> <a href=

http://www.cargill.com/>

Cargill's Corporate Web Site</a>

robherc · Nov 29, 1999

How about putting in a memory-resident script (load it into a RAMdisk) that sends you a simple e-mail message whenever it CAN access files on the naughty drives; this way you'd know something was up if you didn't get the message (i.e. make the message ONLY have a subject such as: "drives working" to keep download time used by messages @ a minimum). 
Do you think this would help with your problem? -Robherc <a href=mailto:robherc@netzero.net>robherc@netzero.net</a> <a href= > </a> *nix installation & program collector/reseller. Contact me if you think you've got one that I don't

AndyBo · Nov 30, 1999

Take a look at Big Brother at <A HREF="

http://www.bb4.com"

TARGET="_new">

http://www.bb4.com</A>

This will watch your system logs for WARNING messages and similar, or it would be easy to add in your own script to throw up a warning to Big Brother if something went wrong.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How to know when a disk subsystem has stopped responding

darkstar

MIS

slars

MIS

darkstar

MIS

OpenSys

MIS

MikeLacey

MIS

robherc

Programmer

AndyBo

MIS

Similar threads

Part and Inventory Search

Sponsor