Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Windows 2K3 disappears from network every sixth hour from its previous disconnection

Status
Not open for further replies.

sriyapaka

Technical User
Jul 21, 2012
6
0
0
AU
Hi all,

I have a problem here in production environment. We are using W2K3 SP2 and SQL 2008 combination. It is found that the server gets disconnected from the network every sixth hour from its previous disconnection. The SQL resources are running fine but the server itself becomes not reachable. When I mean server, its IP address, DNS name etc.

While closely monitoring the issue, we found even the RDP connection to it expires exactly at the same time. It is just an interruption of 10-30 secs and after it everything is normal. Only clue is, it exactly happens every sixth hour. we also found its happening to all the server in the VLAN at the same time.

Sixth hour i mean is, this is the pattern.. for eg it happes at 9:15 am then it will happen around 3:16 pm and then 9:17 pm and than again next day 3:18 am, 9:19 am, 3:20 pm, 9:21 pm etc. So its not exactly every six hours, it every sixth hour from its previous disconnection. The timing is keep moving.

Note:

DHCP is not enabled on the server. Its configured to use static ip address. Server is cluster based. No DHCP server in the environment.
DHCP client is running on the machine to register DNS with the DNS server. No windows firewall enabled.

So its more like a configuration issue that is something configured to do something every sixth hour from its previous instance.

Please let me know if you suspect any.

Thanks
Jadia
 
It is found that the server gets disconnected from the network
How did you determine this? I would have used a continuous ping from a command prompt to the IP address of the server and see if it drops IN CONJUNCTION with another continuous ping from a command prompt to the server via NAME. I would guess that the ping to server IP would continue, but to name would crap out.

If that's the pattern, what are the DNS settings on the clients set for - they are the ones trying to reach the server and failing. It should be solely the IP address of the W2K3 server.
 
Thanks for your time.

Its worth pinging both ip and dns name together. Will try soon.

But I don't think, its because of client setting. Because there are redundant application servers and it loses connectivity all together.

Note, even the RDP connection from local machine to the SQL server expires at the same time. RDP connection is via ip address. I think, it answers your question whether its IP or the name crash.

We also found, all servers in VLAN goes down together every sixth hour. Wondering what is that 6 hour configuration.

Thanks
Jadia
 
Run Processmon.exe on the cluster servers just before the 6th hour?
On the switches any errors related to lost connections or delays?



........................................
Chernobyl disaster..a must see pictorial
 
Thanks for your time. Will suggest about processmon.exe

We tried perfmon.exe before, during the issue i.e 10-30 secs, we got disconnected from the server
and when we logged in it was blank, performon has collected no data for that period i.e 10 - 30 secs.

The application servers receive prelogin connection reset response from the DNS.

Network graphs (i think, switches as well) shows sudden drop of data transfer to the server and immediately after the issue, the data transfer was double the amount for few seconds and it restores back to normal. Double the amount of data is due to the delay caused by the issue I believe.

Thanks
Jadia
 
Have you run DcDiag.exe /v and NetDiag.exe /v?
Should like you are almost due for a packet trace.
Doesn't sould like resource exhaustion, as I have not seen recover in 10-30 seconds.
If you do not find an answer soon, a support call to MS might be economical, as long as they fix it, and document all the reason/steps in the process.



........................................
Chernobyl disaster..a must see pictorial
 
These server dont happen to be virtual are they? Sounds like your SAN is running a snapshot every 6 hours?



RoadKi11

"This apparent fear reaction is typical, rather than try to solve technical problems technically, policy solutions are often chosen." - Fred Cohen
 
Thanks..will check about it.
Why do you say 6 hours, do you remember any default value to be 6 hrs for taking snapshot.

Thanks
Jadia
 
My SAN does it every 4 hours buts its adjustable. Yours could be setup different.



RoadKi11

"This apparent fear reaction is typical, rather than try to solve technical problems technically, policy solutions are often chosen." - Fred Cohen
 
all,

recently we found that the design of our application is slightly changed after the recent project.

Before the project, all application and SQL server were in the same VLAN.
Now after the project, application server is still on the old vlan and sql server is moved to a new vlan.

Now application server resolves to SQL via application server vlan firewall
SQL server resolves to application server via SQL server vlan firewall.

Before the project, there was no firewall at all between application and sql server.

It looks for me like request to sql via one firewall and response from sql via another firewall.

As already mentioned earlier, the issue is SQL server (i.e its IP and dns) disappears from network every 6th hr for upto 10-30 secs.

Please let me know if you suspect any after the above additional detail.

Answer to the previous queries asked : there is no snapshot taken in our environment.

Thanks
Jadia

 
Recently we ran trace on all our nework devices but nothing found wrong during the expected issue time.

All our servers in the VLAN are clustered one.

We found, spike happens during the issue time on Cluster's private network i.e heart beat network.

Still not sure, what is that 6 hour parameter is ?

Thanks
Jadia
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top