Hi,
I have a large system running on Solaris 8, with client software running on Windows. The server runs multiple instances of a single executable, each instance being dedicated to handling the tcp/ip comms to a specific client.
The server typically runs 1500+ concurrent instances of the executable, with around 350 having an active connection at any time.
No changes have been made to the overall application for a considerable time (approx. 1 year).
Recently, there have been two seperate occasions where all 350ish active connections have simultaneously failed, with recv() returning zero bytes read.
I have been informed that the network is beyond reproach, and that the problem must lie within the software. However, I cannot understand how 350+ independent instances of a process could simultaneously encounter the same bug.
Can anyone offer a possible explanation of how this could happen?
Thanks,
Flibs.
I have a large system running on Solaris 8, with client software running on Windows. The server runs multiple instances of a single executable, each instance being dedicated to handling the tcp/ip comms to a specific client.
The server typically runs 1500+ concurrent instances of the executable, with around 350 having an active connection at any time.
No changes have been made to the overall application for a considerable time (approx. 1 year).
Recently, there have been two seperate occasions where all 350ish active connections have simultaneously failed, with recv() returning zero bytes read.
I have been informed that the network is beyond reproach, and that the problem must lie within the software. However, I cannot understand how 350+ independent instances of a process could simultaneously encounter the same bug.
Can anyone offer a possible explanation of how this could happen?
Thanks,
Flibs.