Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Computers Stop Responding 5

Status
Not open for further replies.

rocmills

Technical User
Dec 30, 2002
142
0
0
US
We have had an extremely unusual problem with our office computers for many years now, and no one has been able to help us track down the problem. About half of our computers are logged in to our server, and the other half only map to server directories but the users log into their workstations locally.

The server is running Windows 2000 Server edition and the various workstations are running XP Pro SP2 (we have one Vista laptop). These machines are all less than 2 years old as we had to replace everything after a break-in early in 2005 (what makes this problem especially strange is that it was present on the old hardware, too). We have a mixture of Dell Dimension machines as well as home-built machines.

Each and every day, without fail, at roughly 4:00 p.m., the computers - or the programs on them - stop responding. The network does not show any activity, Task Manager does not show any tasks consuming huge amounts of memory or CPU. But if you try to launch Word by opening an existing document, it just hangs and hangs for upwards of 10 minutes at a time. When it finally does respond, you might have two minutes to work before it hangs again. The same is true of Outlook, Excel, PhotoShop, QuickBooks, AutoCAD, browsers, Explorer, and on.

I was certain that when we replaced EVERYTHING in 2005, that the problem would go away, but it didn't. Even our server was stolen, though all the data was on back up and we restored the data to our new machine.

What could be causing such a consistent and widespread problem? This has gone one for a good 5 years, if not more, so I'm willing to test out just about anything.

Thanks for listening!

--Roc
 
If this has been going on for so long I imagine that you have tried everything known to man to fix the problem, so things suggested here may have been tried already.

Are there any long forgotten Scheduled Tasks, Batch Files or Scripts set to run on your Server at 4.00pm that might involve unknown or deceased machines or locations?

How do you eventually recover from this problem in the current state of play, or does everyone knock-off and go home at 4.00pm?

What is in your office environment or surrounding area that would become active at around 4.00pm each day? Is there any large machinery or industrial plant that would startup or close down at that time? What are your neighbors up to?


Have you checked to see if the problem occurs at say 4.00pm on a Sunday or Saturday or public holiday?

Probably nothing in this link but it is the nearest I have come across that is even remotely similar you your problem.

Mapped drives failing
thread779-1394389

There was one link I vaguely remember where I think the problem was suggested to have been caused by Aliens, maybe they have struck again?

Good luck with whatever it is, but I feel after so long if you ever did solve it, it would be like loosing a dear friend and you would miss the drama.
 
Good one linney! My first thoughts were also power-related...every afternoon, starting around the same time, my network connection goes squirrelly. It randomly disconnects and re-connects the switch to the server, along with the little balloon "Local Area is now connected".

I traced it to bad power in our office, as that time of day all the wall A/Cs are going full blast along with (2) laser printers and an oversized engineering copier. All these devices running concurrently were dropping our AC power below an acceptable level for our switch's UPS. So, I bought a better UPS and the problem went away, for the most part. What used to happen daily now happens a few times a week, and just for a few seconds.

I would not be surprised to find that this was the same root cause. Put a meter across your power at 4:00 pm and see what happens. Or, get an expensive online UPS for the switch that keeps the power constant no matter what.

Tony

Users helping Users...
 
Sure does sound like a "brown out" as Tony mentioned...

The other thing that comes to mind, would be radio interference, which drops the Network links, causing all the workstations to try to reconnect which in turn causes the hang...

e.g. a company I worked at, they noticed that every day at around noon (+/- 30 min.) the office AP and certain network links, would be unaccessable for 15 min or so... to cut the story short, it turned out to be 3 interns, who where heating up their lunch in the office kitchen (microwave)...

Ben

"If it works don't fix it! If it doesn't use a sledgehammer..."
 
I'll call it to everybody's attention that switching power supplies tend to clip the peaks of the sine wave voltage and current. With enough clipping going on in the circuits of a single transformer the weakest link with a switching supply may not be getting enough juice to keep the switcher working as hard as it needs to be.

And although the OP said all this stuff is connected to the server or mapped to the server, in reality there is probably only one RJ45 on the server so the signals are funnelled through something else that probably wasn't stolen because it is never really noticed. ( This assumption requires elimination of the possibility of thicknet or thinnet )


Ed Fair
Give the wrong symptoms, get the wrong solutions.
 
I'm sure you have tried all these suggestions in the past 5 years but I'll ask anyway.

Do the symptoms (for any one station) go away if that station totally disconnects from the server (remove the network cable), then reappear when connection is resumed?

Have you noticed excessive collisions on your switch at the time slow response?

Have you eliminated each workstation one at a time to maybe narrow down the problem, could be a bad NIC on one station?

Have you checked logs on servers and workstations?

Do you have any workstation mapped drives that are not connected.

Have you removed any suspect peripherals?

Got enough memory on the server?

You really need to figure out if the problem is with a workstation, your server, your network etc.

Again I'm certain you have tried all the above more than once but you asked for ideas.

GOOD LUCK!!

sam




 
Thank you, everyone, for all the ideas. I've not yet had a chance to try the new suggestions out, and may not be able to until tomorrow as we are short-handed today.

These hangs come and go, they aren't permanent enough to kill production at 4 p.m., just slows down productivity. When it happens, it may be 2-3 minutes before your applications start to respond again and then they may hang again a moment later or 10 minutes later.

The UPS was recently replaced, and I believe the router was as well.

As my workstation is immediately next to the server, I will check for error logs and try disconnecting from the server at the critical time this afternoon or tomorrow.

Thank you all again, I'll work on those suggestions and get back to you with results as soon as I can.

--Roc
 
No scheduled tasks on server.

Q: How do you eventually recover from this problem in the current state of play, or does everyone knock-off and go home at 4.00pm?

A: The problem either eventually resolves itself, or we just suffer with severely decreased productivity for the last hour or two of work.

Q: What is in your office environment or surrounding area that would become active at around 4.00pm each day? Is there any large machinery or industrial plant that would startup or close down at that time? What are your neighbors up to?

A: I would doubt local environmental influence as we moved across town last October. Both suites on either side of us at our current location are vacant. We are currently near some railroad tracks, but otherwise there is no heavy industry in our immediate area.

Q: Have you checked to see if the problem occurs at say 4.00pm on a Sunday or Saturday or public holiday?

A: You mean come to work when I don't have to? LOL Seriously, though, I have not checked this.

Q: There was one link I vaguely remember where I think the problem was suggested to have been caused by Aliens, maybe they have struck again?

A: Believe me, I've been wondering about ghosts in the machine since this first started.

Q: My first thoughts were also power-related...every afternoon, starting around the same time, my network connection goes squirrelly. It randomly disconnects and re-connects the switch to the server, along with the little balloon "Local Area is now connected". I traced it to bad power in our office, as that time of day all the wall A/Cs are going full blast …I would not be surprised to find that this was the same root cause. Put a meter across your power at 4:00 pm and see what happens. Or, get an expensive online UPS for the switch that keeps the power constant no matter what.

A: We're in Vegas, so the A/C is running 24/7 except in the very heart of winter. The only thing plugged into our UPS is the server box and the monitor, everything else goes straight to a wall outlet.

This event is recorded on the server for the offending time of day, each and every day of the week (it even happens early in the morning and late at night when no one is here). I suspect this to be the culprit, but I have no idea how resolve this or prevent it from happening again. I may be a computer geek, but I'm not a network geek.

Event ID: 5781 Category: None Type: Warning Source: NETLOGON
"Dynamic registration or deregistration of one or more DNS records failed because no DNS servers are available."



Q: And although the OP said all this stuff is connected to the server or mapped to the server, in reality there is probably only one RJ45 on the server so the signals are funnelled through something else that probably wasn't stolen because it is never really noticed. ( This assumption requires elimination of the possibility of thicknet or thinnet )

A: Embarrassed now; I'm not sure I know what an RJ45 is…. but if it the router or switch, you are correct that they were not stolen but they were replaced when we upgraded from business DSL to a T1 connection last year. Thicknet and thinnet are also unknown to me.

I will try to get to Sam's suggestions this afternoon.

Once again, thank you very much to one and all for jumping in with so many ideas and suggestions.

--Roc
 
RJ45 is the network connector on the server that the network cable plugs into. Was to indicate that there was another device between the server and the users, which you have now identified as a router. I was suggesting that you look into the router (or hub, or switch)as the potential source of the problem but you have now answered that speculation.
 
These are the workstation event logs during the periods of non-responsive behavior. There were no corresponding event logs on the server.

Event Type: Warning
Event Source: LSASRV
Event Category: SPNEGO (Negotiator)
Event ID: 40960
Date: 8/18/2008
Time: 4:28:37 PM
User: N/A
Computer: FRONTDESK
Description: The Security System detected an attempted downgrade attack for server cifs/FILESERVER. The failure code from authentication protocol Kerberos was "There are currently no logon servers available to service the logon request.
(0xc000005e)".


Event Type: Warning
Event Source: LSASRV
Event Category: SPNEGO (Negotiator)
Event ID: 40961
Date: 8/18/2008
Time: 4:28:37 PM
User: N/A
Computer: FRONTDESK
Description: The Security System could not establish a secured connection with the server cifs/FILESERVER. No authentication protocol was available.


Event Type: Warning
Event Source: LSASRV
Event Category: SPNEGO (Negotiator)
Event ID: 40960
Date: 8/18/2008
Time: 4:52:54 PM
User: N/A
Computer: FRONTDESK
Description: The Security System detected an attempted downgrade attack for server PGA\fileserver$. The failure code from authentication protocol Kerberos was "There are currently no logon servers available to service the logon request.
(0xc000005e)".



Event Type: Warning
Event Source: LSASRV
Event Category: SPNEGO (Negotiator)
Event ID: 40961
Date: 8/18/2008
Time: 4:52:54 PM
User: N/A
Computer: FRONTDESK
Description: The Security System could not establish a secured connection with the server PGA\fileserver$. No authentication protocol was available.



Event Type: Warning
Event Source: BROWSER
Event Category: None
Event ID: 8021
Date: 8/18/2008
Time: 5:22:02 PM
User: N/A
Computer: FRONTDESK
Description: The browser was unable to retrieve a list of servers from the browser master \\FILESERVER on the network \Device\NetBT_Tcpip_{ED041F35-C95B-4919-94DE-AE8094499E44}. The data is the error code.


Event Type: Warning
Event Source: LSASRV
Event Category: SPNEGO (Negotiator)
Event ID: 40960
Date: 8/18/2008
Time: 5:28:45 PM
User: N/A
Computer: FRONTDESK
Description: The Security System detected an attempted downgrade attack for server cifs/FILESERVER. The failure code from authentication protocol Kerberos was "There are currently no logon servers available to service the logon request.
(0xc000005e)".



Event Type: Warning
Event Source: LSASRV
Event Category: SPNEGO (Negotiator)
Event ID: 40961
Date: 8/18/2008
Time: 5:28:45 PM
User: N/A
Computer: FRONTDESK
Description: The Security System could not establish a secured connection with the server cifs/FILESERVER. No authentication protocol was available.

I did not have the chance to disconnect from the server when this happens to see if functionality is restored. Hopefully will have the chance to try that trick tomorrow.

--Roc

--Roc
"Whatever one man can dream, another can accomplish" - Jules Verne
 
 
The cleaner isn't unplugging your server or a network device to clean your computer room at 4pm is she? lol

--------------------------------------
"Insert funny comment in here!"
--------------------------------------
 
Thanks everyone for the helpful and humorous replies. It happened later than usual yesterday, but it was completely crippling and of course came at a time when the boss wanted three things done immediately. I was literally banging my head against the keyboard and wanted to take a sledge hammer to the server.

I tried the registry edit fix from the Microsoft link - rebooting server and will have to wait until this afternoon to see if it changes things or not.

If it doesn't do the trick, looks to me as if I may need to call in a professional geek to solve this problem once and for all. Some of the other links make sense to me, just enough to be dangerous and I don't want to kill the server in an attempt to fix it. EventID wants money for the truly helpful answers - though that would be much less than calling in a technician.

--Roc
 
Well good luck; sometimes it's so much easier to diagnose when you're on site.

I hope you solve your problem and post the results!!

 
I will post again later tonight after I've seen if the last two changes I made actually do the trick. Barring that, I will move on with some of the things Sam wrote about.

If it does come down to bringing in a specialist, I will watch over his shoulder and then post the resolution here.

As previously stated, I did the registry thing that Microsoft wrote about, and I also renamed the Netlogon files, per another MS suggestion - though, honestly, I've been getting intermittent weird behavior early in the days since following these steps:

1. Stop the Netlogon service.
2. Renamed Netlogon.dns and Netlogon.dnb to *.old
3. Restarted the Netlogon service

Now I just have to wait about 3 hours to see what happens next.

--Roc

--Roc
"Whatever one man can dream, another can accomplish" - Jules Verne
 
Well, it didn't happen yesterday! As I've said before, this is usually a daily occurrence, but I wont start to celebrate until I've had several days in a row without a hang.

The two changes I made were the registry change recommended by Microsoft:
Configure the Netlogon service to depend on the DNS service. This will cause the Netlogon service to start after the DNS service starts. To do this, run REGEDT32, and go to:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon


And the change mentioned in my last post. I have trouble believing those two simple things could do the trick, so we shall wait and see.

Again, thank you everyone for the help and encouragement, I will keep you posted.

--Roc
 
CRAP!

As you may guess, it is still happening. Later in the day, which is better than nothing, but still - the last 30 minutes of the day I may as well not be using the computer.

Heavy sigh.

--Roc
 
Are the errors in the Event Viewer still the same as before or different?

Can you make use of a program such as Process Monitor and perhaps find out what is happening under the hood?

Process Monitor v1.37

Does Task Manager on either a Workstation or Server tell you anything?

Have you had a look at the Domain Controller Diagnostic tool(Dcdiag)?

DCDiag and NetDiag in Windows 2000 Facilitate Domain Join and DC Creation
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top