Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

monitoring for lost connection

Status
Not open for further replies.
Oct 15, 2002
153
US
we have a switch going out somewhere I think, but am not sure where and I need a way to hold open a connection to several systems on various parts of my LAN and WAN and then report when the connection is broken and unable to be restored within several seconds.

At pretty random and sparse intervals,we will notice our connection to one of our main enterprise servers will drop. It is accessed by almost all employees via telnet, and out of the blue the telnet client will freeze and after 20-30 seconds will either return to normal or usually will disconnect. I also notice I am not passing any packets at the exact same time from my desk to the internet. The interval is so short that by the time we realize what is happening, it self corrects before we can run any diagnostics.

I dont think its a switch rebooting, as all of my switches are at least web managed and require much more than 30 seconds to restart. The SNMP Core switch shows no drop in uptime, so its not rebooting.

The failure interval is pretty far and few between (anywhere from once every couple weeks to once every 2 days), but due to the nature of some of the terminal sessions when they get kicked off it can cause pretty nasty results. Some modules require users to spend hours entering data with no way to save or pause the process; they have to start and go until they are finished. If they are interrupted they must start over at the beginning.(yes, VERY bad design)


I get the feeling its my Dell layer 3 Core switch that interconnects all of our buildings as well as the fiber WAN link to our NOC and the rest of our sites, but cant say for sure.

We use an active monitoring package called IP Sentry, but it only polls about once every 90 seconds so the odds so far have been that it fails between polls. The polling time required to check all sites is long enough that even if it did die during a polling session we wouldnt be able to test enough nodes to get a good grasp of what is causing it.

Is there an easy way to open a connection to multiple TCP/IP devices and alert when the connection is dropped and it cant be restarted? if I can locate the test server in the right spot and monitor across just the right links I can probably narrow down what link in the chain is failing.


Suggestions?
 
See if you can enable syslogging from the switch to the server people connect to, and set up a syslog daemon (like Kiwi, which is free), and you can send icmp keepalives ever second, to see which switch could be malfunctioning. Then deep-six the switch and call it a day...

Burt
 
Also double check to make sure your router or firewall isn't dropping long existing connections.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top