Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Event Viewer - Send notification on Error ??? 1

Status
Not open for further replies.

rfr100

MIS
Oct 19, 2001
608
0
0
GB
Hi All,

I have a Windows 2000 Advanced Server cluster.
Comprising 2 Compaq DL380's and an MSA 1000.

One of the servers in the cluster has developed a memory leak causing the cluster to fail over every 4-6 weeks.

We have identified the error 2020 as the one which points to the imminent failure of the cluster node.

While we wait on Microsoft to resolve the problem we would like to get a handle on when these errors occur.
Usually they pop up in a batch of about 5-10 in a couple of minutes and then the server fails over (doesn't crash!!)

Is there a way to monitor for particular system events ?
I don't want to pay for a third party app as I'm only going to use this until the problem is fixed.

Cheers,

Rob
 
might be a printer/spooler prob.

error 2020, The indicated color transformation is invalid.

no clue what that means, maybe it helps you.

good luck
 
This is my error.
I know what it means, and how MS think you fix it.
But the fix no worky worky.


Event Type: Error
Event Source: Srv
Event Category: None
Event ID: 2020
Date: 15/04/2004
Time: 12:45:11
User: N/A
Computer: ******
Description:
The server was unable to allocate from the system paged pool because the pool was empty.

For more information, see Help and Support Center at Data:
0000: 00040000 00540001 00000000 c00007e4
0010: 00000000 c000009a 00000000 00000000
0020: 00000000 00000000 00000004

My question stands.
Anyone know a way (without 3rd party apps) I can monitor for this error and get a notification. Whether it's SMTP or windows messenger. I'm not fussy (MUCH)
 
You ran you of Page table entries. This is absolutely clear from the second dword of line 0010 in the dump data. C000009A means STATUS_INSUFFICIENT_RESOURCES. This is almost always a driver, and rarely an MS issue. You can use gflags and poolsnap to figure out which driver has the issue. Some common culprits are Wquinn and groupshield, as well as certain HBA drivers. You're on an MSA1000, so I suspect you are using the Compaq/Emulex HBAs with the miniport driver. You're on a cluster, and likely are using the /3gb switch since you are running advanced server. Have you tried setting system pages to 30000?

As far as what to do, you can use perfmon to set an alert for when the memory counter "pool nonpaged bytes" and/or "Free system page table entries". If you also run counter logs through the next episode, you'll be able to determine a safe level to alert at. On the action of the alert, choose run this program. The program would be a batch file that uses the cluster command line to move the groups to the other node. You could alternately use on of the command line mailers like Blat in your batch file to send an smtp message. You can send a message via windows messenger without running an external command.

 
I was going to use PerfMon as a fallback.
Spot on with the diag though.

The company I work for is of such a size that in Central IT there are 30-40 Microsoft employees permanently assigned to our company.

Central IT took the lead on this issue and MS are running constant performance and diagnostics against this cluster.

Interestingly MS don't think that the problem is driver related. We are updating our NAV tonight at MS request, the next port of call is to increase the RAM by a gig !!

And round and round we go.

 
Nav has been a problem for a while. Start there. I've been to several companies in the US that have this issue. I don't work for MS, it's been close to a year now, but I have seen this problem 50 or 60 times. I know the error codes by heart. From the size of your MS presense, I can pretty much deduce the company. Bentonville or Bloomington. I was there a year or a year and a half ago.

 
Actually I work in Europe for a large Car Manufacturer.
 
Oh, that one. I didn't do much outside the US, so I guess I just assumed you were in the US. That'll teach me to make assumptions.

 
There is a light version of a program called Event Sentry that will send notifications via email.

-Ryan
 
Splendid. Have a star.

Actually since this post the server has gone to hell in a handcart.

We had to fail the cluster over because for some reason both processors decided to run at a constant 55-75%.

Server is now going to be rebuilt tomorrow.
 
What process was taking up all the CPU? Inetinfo? If so, I'd suspect your AV software has a synchronous SMTP transport event sink that's giving you the grief. The SMTP tranport event sinks typically run in inetinfo's memory space. If you really wanted to prove it, you could load ADPlus and manually generate a userdump when it gets that way. Then, you could load the dump file up in kd or windbg and unwind the threads in the stack backtrace. You'll at least be able to see how many threads there are, and each thread's state. I bet you have a slew of them waiting on I/O. Each thread waiting on I/O has to buffer the data somewhere, and all that buffering can consume quite a few PTEs. Combine that with the /3GB switch, that can severely restrict the number of available PTEs, add in a slow leak because the drive is not properly releasing all the PTEs each time, and you have the problems you are experiencing.





 
Oddly, it was/is the System process.

Between System and System Idle using up 25-35% CPU each at all times.

This is a little off the original thread topic. But it's my thread so I don't mind. :)
 
That's sounding more like a filter driver then.
You'll need a kernal mode dump to figure that one out.


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top