Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

MSSQL Server 2014 Freezes every time RAM get's to 79GB

Status
Not open for further replies.

LookingToLearn

Programmer
Feb 23, 2019
2
CA
We have the Enterprise version 12.0.2000 running on:
Windows 2012 R2 Standard 64 Bit
8x16GB Micron 16GB DDR4 2Rx4 (128 GB of RAM)
2.6GHz Intel Xeon-Haswell (E5-2690-V3-DodecaCore)
2.6GHz Intel Xeon-Haswell (E5-2690-V3-DodecaCore)

The Issue that we have:
Around the same time each day for the past 4 days 1 hour up and down the SQL Server becomes completely unresponsive, we are not able to connect to it at all, we can RDP into the machine and watch what's happening but we can not connect to the SQL Server, since it happened 4 times the first two times it lasted for 20 minutes the third time it was 10 and today it was 30 minutes. RAM starts to drop at that point and after it drops to 0 everything is back to normal.

What's interesting
The RAM almost always get's to almost 80GB which you can see in the attached Usage.png screenshot and one of the cores is at 100% utilization when this happens. Then as time starts to pass from when it happens the RAM starts to drop little by little for the sqlserver.exe and then it drops down to 0 and, the process that was at 100% starts to work again normally and everything goes back to normal and server starts working again without any issues.

Insight into when this started happening
5 Days ago we moved onto a more powerful server which has the same exact specs as this server. Then we had this happen for two days in a row, the second time it happened and we could not figure out what's going on we moved onto a brand new server yet again thinking something is wrong with the server. The same thing is happening on a brand new server and we have absolutely no idea why. The primary difference between the old server and these new ones is that this server has more RAM and has more cores. The old server we had was a 64GB and 16 cores machine & ran the same versions of windows and MSSQL.

Usage_vdxgvr.png

Cores_zzkmlo.png
 
Please don't cross post on multiple sites.

Original question (as far as I know)
https://www.sqlservercentral.com/Fo...rver-2014-Freezes-every-time-RAM-gets-to-79GB

and same reply I gave you there
I would not even try to figure out what is causing the issue - you are still on the first version of 2014 - upgrade that to the latest SP (SP3) which fixes lots of issues.

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Sorry about the cross post. I will be updating this thread here as well when we 100% figure out the issue. From the looks of it the issue was related to Mcafee virus scanner which was scanning our SQL files, we will update shortly when we confirm 100% so that in the event someone has a similar issue they can Google it.
 
Thoughts:

1) Disable Mcafee's scanning of the MDF and LDF files.
2) Bad memory stick perhaps?


Just my $.02

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
You mentioned that the new machine has more cores. It looks like it has two CPU sockets from the Task Manager output. There's a technology used by some multi socket servers called Non-Uniform Memory Access (NUMA). What that does is divide up the physical memory on the machine to be "owned" by the processors in each socket. This allows very fast memory access as long as your thread is accessing memory owned by the processor/socket that is running it. The downside is, if it needs to access memory "owned" by a different socket, everything can slow down drastically.

Since you mentioned that things run normally until memory gets up to a certain point, this could be the problem. The more memory your process has, the more likely is will need to access memory cross-socket.

In Task Manager, Right-Click on the graph and select "One Graph Per NUMA Node", or something like that. That will give you a picture of how the processing it broken out per socket in the system.

This is something that can really kill VMWare performance, and there are a lot of articles out there about it. Also IBM, Intel, Microsoft, and others have some good articles. I have an app we just bought that we're trying to deploy that's having performance problems and the vendor is saying our server is too big, and we have to disable 3 of the 4 sockets in it. Yes, that's a wonderful thing to hear [/sarcasm].

Anyway, something to look at. It's easy enough to test. There are instructions online on how to disable a socket, or manage NUMA configs. You can disable all but one socket and then try it again. You'd probably need to Google with specifics of the Windows version and the server make and model.

Here's some info from Microsoft regarding NUMA and SQL Server.

 
Yeah, looking closer at your Task Manager graphs, you'll notice the top 24 CPUs (first NUMA node) have little bits of random CPU usage across all of them. The bottom 24 CPUs (second NUMA node), only one CPU is getting any work and it's pegged. Starting to smell like a NUMA issue.

I would still spend soem time looking into your Application and System event logs just to eliminate anything else. You don't have to go back too far, just a little before the problem manifests itself.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top