Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Insolvable network problems

Status
Not open for further replies.

johanneke69

IS-IT--Management
Mar 1, 2001
14
0
0
US
We have a strange network problem on our LAN.

At certain random times our network clients seem to lose their network connection to a server, when this occurs they can't

connect to one server but are still able to connect to other servers.
This results in many strange things and errors.

"Disk or network error" (MS-Access)
"Cannot save document you must select another file name" (MS-Word en MS-Excell)
"The file is in use by another user… " (MS-Excell)

We have the impression that the problem exists more when network traffic is low.
As a result of this we started pinging the servers every second from a client PC connected to the same switch as the servers.

And we discovered that when this error occurs the server in question gives no replies on the ping command. (Sometimes for

more than 15 seconds and this happens about 20 times a day)

We discovered now that when we open PcAnywhere sessions to the servers the problem disappears.
The server consoles need to be unlocked and the CPU monitor needs to be visible (to send more data over the PcAnywhere

session).
When the sessions are open the servers can respond to the ping command at all times and the clients don't lose their network

connections to the servers anymore. When we keep the servers very busy (CPU and LAN) then the clients also seem to keep their

connections alive.

Does anyone know of this phenomena or knows a solution?


Here are some specs of the things we have.

Servers:
Poweredge 1300 WinNT 4.0 SP6a (with Intel pro 10/100 LAN card)
Poweredge 2500 WinNT 4.0 SP6a (with Intel 8255xx-based 10/100 LAN card)


Network:
Cisco Ethernet switch model 3524
All servers are connected to this switch and they negotiate 100Mbit Full Duplex.



Things we already tested but didn't help:

We tested with other switches (baystack, 3com, Cisco) and hubs (Intel,3Com)
We changed the cabling.
Forced the auto-negotiation to 100/full 100/half 10/full 10/half
Other network cards (in clients and servers)
Other network card Drivers (in clients and servers)
Checked out the temp directories on clients (MS KB: Q150943)
We disabled all the power saving on clients (NT4 Server has no power saving)

Best regards
Johan

 
Well I finaly found someone with similar issues that I have with my W2K network.

I have only about 35 workstations, combinations Windows 98 and Windows XP systems. One server is running W2K SBS and the other just W2K Server, the 2nd server is a replicated server on the same domain. These servers are scfcu1 and scfcu2 repectfully.

We get various problems from "Out of memory, or disk space error" when saving user documents to a "Home" directory share, to various problems with stand alone DB programs. One program called Performease by KG&A uses a DB called Advantage from Extended Systems, Inc.

We get various index and update errors to the DB over the network but its odd sometimes it works sometimes it does not.

Also scfcu1 is on the network of 192.168.1.x and scfcu2 is on 192.168.2.x

Things I have tried:

Setting all my workstation network adaptors to FIXed 100MB and Full Dplx, scanned for viruses, we have Symantec Corp on ALL systems and defs are up to date. Checked for Spyware/adware on ALL systems, replaced a posibly defective Switch, Reconfigured our DNS server (4 times!), updated ALL BIOS/Firmware/NIC drives on both servers (Dell Poweredge 2600, 1 - Single Xenon 2.4G 2GB Ram 3 - Raid 5 36Gb Drives, 100/1Gb Intel NIC, 2 - Dual Xenon 2.4G 4GB Ram - 3 Raid 5 36GB Drives, Intel 100/1GB NIC).

We also have 2 Linux Boxes, 1 is a intranet server (WEB 192.168.1.250) and the other we use as our firewall (192.168.1.1). 192.168.1.1 is also running a proxy server for some users. I am working tword getting Proxy running on the W2K server and remove it from the linux box. I use the Firewall settings to determin who has open access to the internet and those who must go through Proxy for limited (Designated) internet service.

We have DNS running on both servers, The gateway is out Linux Box, 192.168.1.1 and I poit every workstation to DNS on the 192.168.1.x network to PRI: 192.168.1.3 (SCFCU1) and SEC: 192.168.2.3 (SCFCU2), and all workstations on the 192.168.2.x network to PRI: 192.168.2.3 (SCFCU2) and SEC: 192.168.1.3 (SCFCU1), the gateways are pointed to 192.168.1.1

Out local domain is SHELLCU.LOCAL, although the owrkstaions will only connect withoug the suffix (SHELLCU).

We are not running WINS.

I think thats it. ;-) (I have little hair left)
 
HMM
192.168.1.x to get to 198.168.2.x they must pass thru 192.168.1.1 ?






 
internet
|
linux(router/firwall/proxy) 2 nics)
| 172.16.1.1
255.255.0.0
Hub/Switch
|
--------------------------------------------
| | |
Linux(intranet) | |
| |
server1 (dns/router 2nics) Server2(dns/router2nics)
172.16.2.0 172.16.3.1 172.16.4.1
255.255.0.0 255.255.0.0 255.255.0.0
| |
hub/switch hub/switch
| |
Workstations Workstations
172.16.3.x 172.16.4.0
255.255.255.0 255.255.255.0
 
Does the story with the ip adresses make sence with my question ??
 
sorry that was to akas post

Johanneke69

hows the network configured ?

a diag would be help full sometimes just drawing out your setup shows you the obvious its not the first time i havent seen the wood for the trees
 
Got an email and I can send you a diagram. my email is carl.slaughter@shellcu.org

 
Format them all and start over! ;) of course, i'm just kidding.
 
The topology is correct, it's not changed for years.
Please read the question AGAIN and try to imagine the problem.


 
i had a slow network once because someone had plugged a phone into a data point .(the data network and telco shared the same cabling.)

beginning to grasp at straws sorry

it does sound like a route problem
i.e packets to 1 server are going along the M1 and other Packets to server2 are one the M25

you have wireless equipment ?
is it possible the traffic is routing thru the wireless connection to one server and not the other.

wireless works on collison avoidance rather the detection like a switch/hub as such can add a lot of latency to a packet. maybe experminting with the TTL may be of help.






 
Just looked at the diags you sent and only thing i can see is your dns servers are on differnt ip subnets meaning to to get a ip stored on the secondary dns it must pass thru your firewall

i dont know if you want to firewall internal traffic but with all your subnet masks set to 255.255.255.0 then for any traffic from you 192.168.2.x range to reach any client/server on the 192.168.1.x range it must pass thru your gateway which i`m guessing is your firewall

changing your subnet masks on your servers to 255.255.0.0 will allow each server to see each other without passing thru the firewall .
 
Me again ;-)

I am still having the issue that when people go to save a document to the shared drive it comes back to the client "out of memory or disk space error". Although it seems to be less frequent.

The other application involves a acsess database on the network share, that is in use by about 10 clients (separate DB's), they get an error updating records (its a VB program), the press OK message box and then it goes through (always in the same place in the program).

On a machine that the error occurred I captured the screen and went to save it to another share and got the sam message, after retrying several times it finally saved then the user was ok.

I have all my server and workstation parches up to date, no viruses (Symantec Corporate edition 8+).

I have also replaced ALL my old hubs with switches, so I now have the Dell 3048 switch then off that on 3 ports Del 2016's.

I did not mention that I had two servers before in a replicated domain environment. the other server is on the 192.168.2.x network.
 
One more thing to mention the one server (SCFCU1) is a Windows 2000 SBS, the other (SCFCU2) is a W2k Server acting as a replicated domain server.

Also I had to apply all the server patches to the SBS because the 1a patch for SBS said I was not running SBS, figure that.
 
ive had this hen saving to direct to a mapped share once i moved the files into a directory on the share all worked well (insted of saving to M:\ i save to M:\folder\)
 
Johan,

Couple of things to try

1.) Whenever this problem occurs, have a look at the switch ports straightaway to check if there is a light that blinks suspectively. I mean in some other way than all the other machines. You should be knowing how the light blinks when there is a NIC storming packets. All the machines would be blinking quite fast, but this light will be blinking very slowly compared to others.
2.) If there are many hubs/switches cascaded, then disconnect all the hubs/switches and separate the server’s hub from the rest. Then try to find out, if the problem occurs again. One by one, start connecting the hubs/switches again. In the process, if you find the problem has started occurring after connecting a hub/switch, get down to the PC-level. Remove all the machines from that hub/switch and start connecting the machines one by one and monitor. Once, you track the PC. Check for the installed softwares, do an antivirs scan to check viruses. If nothing is found, swap the NIC with one of the other machines and connect the other machine to find out if the problem is still occurring. In this way, you can trace whether it is software or NIC. If its virus then it may have spread to other machines too, extra bit of effort would be required. But, lets try out this first

I know, this would require a big downtime of the network, so you can plan it for next greenzone. It’s a big task, but we see the problem has also been pending since long. Also, let us know the output, till that time, probably, we may come up with some other clues as well.

All the Best,
Samir
Success is Failure turned inside out
 
HI akakillroy,

did u check the free space on u'r server. sometimes Windows OS does not show the free space correctly. There is good software called "TreeSize Professional" to find the free and used space in u'r machines.

Good luck

Cheers
 
Johann,
Here is my 2 pence worth...

What are the server CPU loads like when you are experiencing this issue?
If very little, then it may be a broadcast hog on the NICs. e.g. a master browser election or something, that cannot be resolved and causing a loop.

Try a network sniffer on one of the servers (Etherreal is pretty good - and free), or use the supplied Network monitor to see if there are excessive broadcasts at the time. It may not be election issues (this was an example), but you can use browmon.exe (NT reskit) to view what is going on with this.

This obviously depends on the configuration of your network devices (e.g. bridging\routing\switching etc).

Good luck!
 
Oh, and another thing, if you are running older OS's, you may have a rogue program somewhere on the network that may cause this. They tend to be found on the older PC's in some strange corner of the network. The authentication demands of some programs can lead to this (esp going from an 'open' OS to NT+).

We had a IIS 4.0 transaction server-based piece of sh... sorry software that was badly written, and attempted to authenticate to the PDC around 5000 times a second when someone requested a certain page. Just a word of warning...

Good luck!
 
What are the server CPU loads like when you are experiencing this issue?

The CPU load is 5% or lower.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top