Hello all, I realize my question that I'm going to ask will not be able to be "solved" per se, but I would just like some suggestions on how to find the problem. This explanation may be quite long, so I apologize. Here it goes...
Background Info:
Servers: I have a large production environment that is load balanced between 8 physical Windows Server 2003 servers. We are using Microsoft's NLB load balancing software. Each server has the EXACT same code loaded (obviously) and is running my website.
Application: This is a VERY large application (a couple million lines of code in total). It consumes a fairly decent amount of memory, each server is usually taking up about 600 to 800mb in the w3wp worker process thread and usually consumes 20 to 50% of the cpu utilization.
Database: all 8 app servers connect to a single (different) SQL Server with runs SQL Server 2005.
Problem: At least 3 to 4 times throughout the day, IIS on one of the servers starts to "hang" or not respond. When I get a call that "the system is running slow" or something like that, I run a little bat file that will load all 8 "localhosts" onto my build server. Then, whichever login page does not load at all, I know is the server that is causing the problem. I then go to that server, do an "nlb stop", then "iisreset", open up " to make sure it loads (always does after the reset), then I do an "nlb start" and the problem is fixed. This process repeats itself 3 to 4 times throughout the day. And sometimes, after getting one of the servers reset, I will run my script again, and a different server will be hanging, so I will have to repeat for that one. About half the time for each call, a second server starts hanging after fixing the first one.
Question: Does anyone know of any tools out there that could be used to monitor either the w3wp worker process thread, or IIS in general to help me figure out why IIS keeps getting hosed up (yes, technical term there)? This is getting very frustrating as I'm not really able to leave for time off because I'm afraid the servers will act up again.
Thanks for any replies!
Background Info:
Servers: I have a large production environment that is load balanced between 8 physical Windows Server 2003 servers. We are using Microsoft's NLB load balancing software. Each server has the EXACT same code loaded (obviously) and is running my website.
Application: This is a VERY large application (a couple million lines of code in total). It consumes a fairly decent amount of memory, each server is usually taking up about 600 to 800mb in the w3wp worker process thread and usually consumes 20 to 50% of the cpu utilization.
Database: all 8 app servers connect to a single (different) SQL Server with runs SQL Server 2005.
Problem: At least 3 to 4 times throughout the day, IIS on one of the servers starts to "hang" or not respond. When I get a call that "the system is running slow" or something like that, I run a little bat file that will load all 8 "localhosts" onto my build server. Then, whichever login page does not load at all, I know is the server that is causing the problem. I then go to that server, do an "nlb stop", then "iisreset", open up " to make sure it loads (always does after the reset), then I do an "nlb start" and the problem is fixed. This process repeats itself 3 to 4 times throughout the day. And sometimes, after getting one of the servers reset, I will run my script again, and a different server will be hanging, so I will have to repeat for that one. About half the time for each call, a second server starts hanging after fixing the first one.
Question: Does anyone know of any tools out there that could be used to monitor either the w3wp worker process thread, or IIS in general to help me figure out why IIS keeps getting hosed up (yes, technical term there)? This is getting very frustrating as I'm not really able to leave for time off because I'm afraid the servers will act up again.
Thanks for any replies!