Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

reboot (no warning) with /PAE on 2003 nodes

Status
Not open for further replies.

bsherman

Technical User
Mar 5, 2002
5
US
Hello, everyone.

I have a Windows2003 cluster, two nodes, 12GB each. When the /PAE switch is on the nodes reboot themselves without warning or noticable error. I think that I've ruled out bad hardware because it occurs on more than one server. With /PAE off and only 4GB available to the OS the cluster seems rock solid (1 month, no issues).

After the month-long honeymoon I turned on /PAE on node 2 and within 48 hours node 1 rebooted itself. I know that you won't believe that, but there it is. I turned off /PAE on node 2 and we've been stable on both nodes since.

Anybody seen this before?
 
Yes, we've seen this problem before. We are running a Windows2000 cluster, two nodes with 8GB each. With the /PAE switch off, we're stable. The moment it's on, within 24-48 hours, the system reboots itself.
We switched hardware, so like yourself, that isn't it.
Unfortunetly, we haven't solved the problem. Have you?

One last question. Are you running SQL Server 2000 on either node?


 
Using the /PAE switch causes each PTE to take twice as much RAM, effectively halving the number of PTEs available to device drivers. I imagine this stress is causing something to blow.

In Windows 2000, you used to have a similar problem with just the /3GB switch. In windows 2003, you can use the /userva option to reserve some extra PTEs. In both windows 2000 and Windows 2003, the /pae switch cuts the number of available PTEs in half due to the fact that PTEs take twice as much space.

You could go with gflags and poolsnap to determince which device drivers using up PTEs and drop any that are not essential. You could also set systempages to make more PTEs available.

 
My cluster nodes are all running SQL. We have since removed one of the nodes and rebuilt it with W2K3EE and had the same issue. So, this is not a cluster issue, obviously. But I had to prove that. Cluster was getting a bad name around here. The next thing we did was to rebuild that server again with W2KAS. The problem seems to have resolved itself with 2K.

I also added a third node to the cluster (because of the rebuild mentioned above, it's only a two-node cluster, again). This was a different model server -- a Compaq ProLiant 8500 with 4 procs and 7.5GB RAM. The AWE is enabled and seems to be functioning fine on that node. I am leaning toward my problem being a W2K3EE/AWE problem only on the HP ProLiant BL40p. I need to do more testing before I am confident with that speculation.

I will also look at the PTE tricks that xmsre mentioned.

Thanks, everyone, for your input.
 
Hi all!
Well, keeping my fingers crossed, it turns out it was a /3GB, /PAE issue. With both switches set, the number of free PTEs dropped to below 3000. Bringing up SQL Server dropped the number lower, and as we attached the application server and ran jobs the number dropped still lower. Our paged pool and nonpaged pool memory remained about the same however. More PTEs must be required to map the expanded memory than standard memory. (?)
I wasn't permitted to force the number to zero proving that was the problem. We removed the /3GB switch, enabled AWE in SQL Server and set max memory to 6GB, then rebooted. The free PTEs jumped back up to 80,000+. We've been running stable since Saturday. I'm guessing that our scheduling software, the only other application that runs on that box, is gobbling up memory. None of our other x440s exhibit this behaviour. I also guessing, that since the problem with obtaining memory probably occured with a driver or non-Mircosoft product, it was never reported to the event log. I've seen this behaviour before with products that use Microsoft APIs.
Thanks guys for your input.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top