Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

network collapsing multiple times a day

Status
Not open for further replies.

tamray

IS-IT--Management
Feb 8, 2005
43
0
0
US
I have a k12 school network giving me fits. All user connections will drop off multiple times a day. The outages only last between 18-22 seconds, about 4-6 request timed outs (very consistently). The servers are all in one switch, and never lose connection to each other. All users eventually connect to a single switch upstream from the server switch (no errors on any managed switches). I believe it is a bad device or faulty port on a switch somewhere causing the problem. Increased network activity triggers the outages. The failures come about the time 30 + users login, and occasionally after that (16 times yesterday). I isolated a comp lab, in hopes of verifying it was causing or contributing to the problem. Ntop shows the network is only running at 5% load. The outages are very random, and so brief, that I have not been able to run anything like wireshark, etc.., to help identify the problem. There are a ton of unmanaged 5-8 port switches in the building, which complicates things further. Is there something I can run, or do that will help figure out what is going on here?
 
It sounds like STP reconvergence and/or simple duplex mis-matches. You should document (diagram) the network topology to verify where any loops are and tune the network accordingly. You should also check there are no duplex mis-matches between switchports and between hosts and switchports.

HTH

andy
 
For managed switches they are using managed SMC Tiger switches, and Cisco 3500 XLs. Two of the SMC switches are connect via fiber. The rest are connected via cat v. The connection from the last SMC switch to the server room Cisco Switch was fiber, but I changed to cat v during testing. The problem still occurs, regardless of how they are connected.





 
I checked all ports for errors when this first started, but nothing really glaring there.
 
I am with ADB100, it sounds like you have a loop in your network somewhere. I would check all your uplinks.
 
I worked in a school district and people would plug cables from one port in a Nortel switch directly into another, and do other things all the time causing loops. They would wire switches in redundancy, and turn off stp. They would also bring their broadband routers and switches in thinking they could get on their home network! I was dealing with that stuff all the time.

Burt
 
Yup I put my hat in that someone is occasionally plugging something in and causing a spanning tree loop and believe me this will take it down in the blink of an eye , all it takes is to plug two ports into each and its off to the races .
 
I will also throw my vote toward temporary looping. That's the most likely candidate for something like this.

Then again, if it happens daily or multiple times a day, it would be interesting to find out why. It may be something more complex than a simple case of a user connecting and disconnecting a cable somewhere.
 
I am positive it is not a user plugging a cable in. As mentioned in my original post, it is directly related to multiple PCs logging in at the same time, and possibly high usage times. We run a remote desktop environment, so high usage is 5-10% load. Outside of the brief outages, everything runs perfect. I isolated the main lab the other day, along with a few rooms that connect through it. I didn't go through with the separate vlan, but ran their connection through the orange interface of an IPCop box. This seemed to minimize the outages, but did not eliminate them. I put things back to the way they were, hoping I would get a more sever outage, which would make it easier to locate the cause. I am back to my original theory that there is a flaky device on the network and the increased load is just enough to cause it to spas out.
 
I found the problem on Saturday. It was a Cisco 1900 switch in the school library. The cooling fan had died. The network has been stable for 4 days now, after replacing the switch.
 
Thanks for letting us know. That was a weird one. Nothing like an overheating network device to make things act strangely.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top