Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Bad NIC?

Status
Not open for further replies.

FloDiggs

MIS
Jan 20, 2007
296
US
We are having a problem with the most important database server in our company. Users sporadically receive error from the application that interfaces with the server and then they have to log back in. It is an Oracle Database, and the errors we are receiving from Oracle indicate that it is probably network problems. We switched NICs and switch ports last night and the problem is still occurring today with a second NIC.

Using perfmon to monitor the NIC, we are getting a continuous value of 1 for the Packets Outbound Errors counter on the NIC, but not on the loop-back. All other counters look normal. Unfortunately, we didn't run the counters against the first card.

My manager wants to replace the patch cable and patch panel port, but since the only errors we receive are on the packets outbound errors, a bad physical connection doesn't make much sense. Anyone have any suggestions for finding better proof for or against a bad NIC?
 
Sure it can be a bad physical connection, if you run full-duplex.

It could also be a problem with STP on the switches.


"We must fall back upon the old axiom that when all other contingencies fail, whatever remains, however improbable, must be the truth." - Sherlock Holmes

 
Can you expound upon what kind of STP problems would cause this? I see that as more likely than physical problems.
 
I ran into a problem with standard STP running on HP Procurve switches that would interfere with Oracle connections. I never figured out why, but disabling STP on the switch hosting the Oracle DB solved the problem.

YMMV.


"We must fall back upon the old axiom that when all other contingencies fail, whatever remains, however improbable, must be the truth." - Sherlock Holmes

 
Well, we have Cisco switches, and the port for the server has portfast configured. Unfortunately, disabling STP is not an option because of our topology. Any other suggestions?
 
My DBA just provided me the error that we are receiving, which is an OS level error.

net helpmsg 10054

An existing connection was forcibly closed by the remote host.

It's a Windows 2003 Server with SP2.
 
We switched out all patch cables and the ports on the patch panels last night to rule out all physical layer possibilities. Just prior to that we found a way to recreate the issue. We use DameWare to manage our servers and our DameWare sessions were getting disconnected with the same error that Oracle was giving us. We can force this to happen by kicking off a new instance of the Oracle backed app. When that is done on a new machine, it copies 180 MB of data down to the PC and in the process boots several people. It seems we are hitting some sort of connection limit on the OS. From the little research I have done, I can't seem to find a finite connection limit listed for Server 2003 Enterprise Ed. The box has 4 Hyper Threaded 3.6 GHz Xeon Processors, and 8 GB of RAM. Any suggestions?
 
Found the problem last Friday, but didn't get around to updating until now. Anyhow, we put the /3gb switch into the boot.ini file last March to allow Oracle to use more memory per process. Turns out that with only 1GB of RAM for the OS, it was running out of memory for the number of TCP/IP connections the server was trying to maintain. We added the /USERVA=2900 switch, which allows you to be a little more specific about how much memory you allot to the applications and to the OS. By providing the kernel the extra 172 MB of memory, the issue went away.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top