Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

iisreset's needed a few times each day

Status
Not open for further replies.

andegre

MIS
Oct 20, 2005
275
US
Hello all, I realize my question that I'm going to ask will not be able to be "solved" per se, but I would just like some suggestions on how to find the problem. This explanation may be quite long, so I apologize. Here it goes...

Background Info:
Servers: I have a large production environment that is load balanced between 8 physical Windows Server 2003 servers. We are using Microsoft's NLB load balancing software. Each server has the EXACT same code loaded (obviously) and is running my website.
Application: This is a VERY large application (a couple million lines of code in total). It consumes a fairly decent amount of memory, each server is usually taking up about 600 to 800mb in the w3wp worker process thread and usually consumes 20 to 50% of the cpu utilization.
Database: all 8 app servers connect to a single (different) SQL Server with runs SQL Server 2005.

Problem: At least 3 to 4 times throughout the day, IIS on one of the servers starts to "hang" or not respond. When I get a call that "the system is running slow" or something like that, I run a little bat file that will load all 8 "localhosts" onto my build server. Then, whichever login page does not load at all, I know is the server that is causing the problem. I then go to that server, do an "nlb stop", then "iisreset", open up " to make sure it loads (always does after the reset), then I do an "nlb start" and the problem is fixed. This process repeats itself 3 to 4 times throughout the day. And sometimes, after getting one of the servers reset, I will run my script again, and a different server will be hanging, so I will have to repeat for that one. About half the time for each call, a second server starts hanging after fixing the first one.

Question: Does anyone know of any tools out there that could be used to monitor either the w3wp worker process thread, or IIS in general to help me figure out why IIS keeps getting hosed up (yes, technical term there)? This is getting very frustrating as I'm not really able to leave for time off because I'm afraid the servers will act up again.

Thanks for any replies!
 
What I've done is using Servers Alive (or standard monitoring software) is hit the website on each server on our farm. If I get a failure, I run the needed commands to restart IIS automatically.

Each server gets checked every couple of minutes when everything else gets checked.

Denny
MVP
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / SQL 2005 BI / SQL 2008 DBA / SQL 2008 DBD / SQL 2008 BI / MWSS 3.0: Configuration / MOSS 2007: Configuration)
MCITP (SQL 2005 DBA / SQL 2008 DBA / SQL 2005 DBD / SQL 2008 DBD / SQL 2005 BI / SQL 2008 BI)

My Blog
 
I appreciate the response, but I'm not looking to monitor them, I'm trying to figure out what causes the issue. Something is making IIS "hiccup" and that's what I want to find.

I've checked the Event Log and there is nothing in there, I'm still checking the IIS logs, but haven't seen anything yet.

We are currently using SCOM to monitor all of our servers in our production environment so we have a monitoring tool already in place.
 
It's probably a bug in the .NET code that's causing some sort of loop or there's a memory leak.

Denny
MVP
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / SQL 2005 BI / SQL 2008 DBA / SQL 2008 DBD / SQL 2008 BI / MWSS 3.0: Configuration / MOSS 2007: Configuration)
MCITP (SQL 2005 DBA / SQL 2008 DBA / SQL 2005 DBD / SQL 2008 DBD / SQL 2005 BI / SQL 2008 BI)

My Blog
 
Ah, didn't think about a memory leak. Thanks! Now I can run my ANTS Memory Profiler program...
 
Will your app support multiple worker processes?

If so try adding a second per app pool (assuming you have the memory to do so). Since employing several to each of our busier application pools we've noticed a big reduction in the number of issresets required.

We've never been able to locate the problem that causes them but suspect it's down to a busy process getting stuck and we simply chalked it up to M$ and live with it.
 
Thanks for the response, I'll look at what it would take to incorporate a 2nd worker process thread.
 
Thanks Dinkytoy, I've briefly read about setting up the multiple process threads, but what are some charactersstics of my program that I need to find that will tell me whether or not my website will be "ok" running in this mode? The only thing that I've read so far that would say NOT to do it is if my program would access 1 file for information. So multiple threads would all try and access the same file, which would cause contention, etc.

Any more help would be appreciated, I'll continue my research.
 
I don't know tbh. We worked on a 'will it work'; 'don't know'; 'lets try it' basis on a UAT environment. It worked ok and was then fine on live.
 
It's probably a bug in the .NET code that's causing some sort of loop or there's a memory leak.

I tried running ANTS Memory Profiler and it crashed everytime I tried to get the second memory snapshot. On to Plan B

Next, I tried using Microsoft's DebugDiag tool. We brought MS in to help us figure out why our performance was bad when we did our .NET 3.5 upgrade (from .NET 1.1) and that is the tool that they used and I remember looking at the reports and it was pretty easy to see what the problem methods were.

Anyway, does anyone have any experience with using DebugDiag? After generating the dumps, I'm not able to read the reports (or they aren't showing me anything useful) so I'm not sure if I'm doing it right. I followed the procedure in this link that tells how to use it for memory leaks ([link]http://support.microsoft.com/kb/919790/[/url])

Also, I remember when MS helped us the first time, they said they needed 2 dumps to be able to analyze (makes sense) so they had something to compare to. Well, that link doesn't say anything about that so I'm kind of stumped on what to do from here, or how to do it "the right way". If anyone can point me in the right direction on this, that'd be great.
 
To update everyone on this subject...

Boss had us get Microsoft involved with this issue and after MS requesting performance logging their suggestion was to implement the Web Garden approach.

So, today is the first day for the web garden, I've set the number of threads to 8 and also with a maximum memory used (not the virtual setting) to 350mb. (By the way, there are 8 load balanced servers also so essentially we have 64 threads running our application).

Now after checking the event logs of each server, we averaging a thread recycle approximately every 3 minutes. With everyone's experience with Web Garden's, is this recycle rate too high, or just fine? My initial thought is that it's too high so I want to raise the Max Mem Used setting to 375 or 400, but I don't want the server to start throwing "System.OutOfMemory" exceptions.

If anyone has any suggestions, or a "don't worry about it, that's normal", that would be great.

Thanks
 
Sounds like the limits you have are too low.

Turn the recyle off for a bit and setup perf mon to monitor Process -> Private Bytes (or Virtual bytes for Virtual Mem) for your w3wp processes to see what your usage actually is. Don't trust task manager.

We've started to see mass OutOfMemory Exceptions at my place now and I end up recycling the app pools manually a lot more than I'd like. I've setup this monitoring myself today to try and set some auto recycling at a reasonable level.
 
Thanks Dinky...

I've since set my Max Mem to 725 now which makes it recycle about once per day. And now I'm only running 4 processes instead of 8. It's still performing about the same as before I did the web garden, but it's a HUGE improvement from the previous Web Garden settings.

Now I have to figure out why we get SOAP errors everyday on one server only once per day...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top