Need to understand the User Connection parameter on POA NLM screen

BobSargent · Oct 31, 1998

We are running GroupWise v5.23 all client server and are experiencing disconnects nearly every afternoon at about the same time. Users will frequently be disconnected with error 8911. Watching the NLM log file, about the time the first user reports getting disconnected, I can see several hundred connections trying to be remade by the GroupWise POA. 
 
On the POA NLM screen is an entry marked User Connections. This parameter climbs steadily throughout the day to a peak of generally around 2300 connections. What is this number indicating to me? I have done extensive reading about the POA in particular, and can find no references to what the User Connection count actually reflects. Should that number be continually growing as it is? Or should it rise and fall with the number of actual client connections? When connections time out, should this number be decreasing? How can I determine if connections are timing out as they are supposed to? 
 
When disconnects occur, there seems to be a 50% chance the client workstation will lock up. Needless to say this is very distressing to our users. 
 
Any help would be greatly appreciated.

RPITERA · Nov 26, 1998

Found this in Novell's knowledgebase; it seems to describe your problem. Let me know if I can be of further assistance. 
 
Robert Pitera 
rpitera@pingry.k12.nj.us 
 
 
Issue 
This customer reported 8911 errors, IP bottleneck, client slow down in all aspects and sometimes even workstation lockups. The customer was running 700 users via client/server on the server. They averaged about 400 active users at any one time. 
 
Two things were performed to remedy the problem. 
 
I. He was running a GW 4 SMTP gateway. This still had the default 5 fin-wait connection setting before it triggered a syn_attack (see TID 2929448). Because of this, the customer had disabled the syn_attack feature on the server. Because syn_attack was disabled, when the stack had 32 half open connections, the server immediately purged the connections. All clients will try and re-establish there connection. The extra overhead from all of the clients compounded the bottleneck. The server progressively got slower and slower unti all IP traffic communication was halted. 
 
To fix the problem, we installed the GW 5.x GWIA gateway, which by default will support 32 fin-wait connections before triggering the syn_attack routine. We also re-enabled the syn_attack routine. By doing this, if the 32 connection limit is reached, the server will discard IP traffic until the connections can be processed and established. This eliminates all clients trying to establish connections at the same time. 
 
II. We re-configured the server with the following settings: 
 
1. Directory Cache buffer Non-referenced Delay = 30 sec (from 5.5 sec) 
This setting will decrease processor overhead and I/O traffic. This setting determines how often the Directory Cache buffer is refreshed. Every refresh requires a new disk read and write to memory. By increasing the value to 30 seconds, the administrator is decreasing how often the refresh takes place. By using this setting, there is little danger in losing data. As new files are added to a directory structure, it dynamically updates the buffer. The reason for this feature is in the case that a file did not get added to the buffer for whatever reason, it will be captured with the refresh routine. 
 
2. Min Directory Cache Buffers = 1000 (was 500) 
By increasing this value, the buffer is already established and no additional resources are required to allocate more buffer space on the fly. This can eliminate processor and I/O bottlenecks. 
 
3. Max Directory Cache Buffers = 4000 (was 2000) 
This setting protects the system from using to much memory for Directory Cache Buffers. 
 
4. Read Ahead Enabled 
The Read Ahead feature significantly increases performance on Netware servers. Read Ahead feature predicts what files are required next and loads them in memory ready for access. 
 
5. Read Ahead LRU sitting Time Threshold = 60 sec (from 10 sec). 
This feature is for the Read Ahead mechanism. This Read Ahead LRU sitting Time Threshold says that if the LRU (Least Recently Used) is below the specified time, do not use the Read Ahead feature. LRU is an algorithm that is used for memory block / page replacement. An 'LRU list' identifies the least recently used cache blocks (blocks that have been in cache the longest time without being accessed) and flags those for use first. It makes for a more efficient caching implementation. The reason for the above setting is that if there is not enough memory to access data from available cache, Read Ahead will take up memory and processor time without increasing performance. If Read Ahead is not helpful, it makes sense to not use the resources. This setting can be configured up to 1 hour. In general terms, if the LRU is 20 minutes or better, the system probably has sufficient memory. This setting could be effective anywhere from a minute to possibly 5 minutes. Be aware that this disables Read Ahead which usually is not a recommended thing to do. If this option is used a lot, it is probably time to add more memory. 
 
5. Max Concurrent Disk Cache Writes = 300 
This setting is a way of changing the Read/Write Ratio through the application layer. GroupWise is usually more write intensive then read intensive. These ratios really should be set through the controller card. If the controller card does not support these types of settings, this option can be used. By increasing the number, the Write/Read ratio is increased (or the Read/Write ratio is decreased). 
 
6. Change the wpcsin and wpscout directories under the post office directory and the mslocal directory under the domain directory to immediate purge. Be sure to include all subdirectories under the above mentioned directories. Tid 2920356 discusses methodology for performing this task. 
These directories have a lot of files written to and deleted from. They should be purged to keep the volumes clean. If the administrator is running suballocation on the volume, the directory should have at least 30% of disk space available at all times. This implies non-purgeable blocks. If the space is free but resides as "purgeable blocks," utilization will be affected dramatically. By setting immediate purge on high traffic directories, the cleanup tasks will be automated for the administrator. See Tid 1005436 for more information on suballocation and high utilization. 
 
7. Max File Locks = 20,000 (was 10,000). 
Although GroupWise does much more record locking than file locking, if there are a lot of users on the system, it is wise to allocate enough file locks. This does require memory and should not be over used. 
 
8. Max Record Locks = 100,000 (was 2,000). 
GroupWise performs a lot of record locks. If there are a lot of users on the system, it is wise to allocate enough record locks. This does require memory and should not be over used. 
 
9. Min Service Process to 50 
Service Processes are dynamic. By pre-allocating them, less overhead is required to allocate them on the fly. As long as there is sufficient memory, this number can be increased. A good rule of thumb is to monitor the server during peak times. Set the Min Service Processes to whatever the current service processes are during peak times. 
 
10. Max Service Process to 100 
This also takes up resources. Monitor this setting in the monitor.nlm. If the current processes begin to approach the maximum, increase the maximum service processes. 
 
11. New Service Process Wait Time = .3 sec (was 2.2 sec). 
This setting can drastically increase performance. When a Service Process is required, a new one can be created quickly. With the default setting of 2.2 seconds, the theory is that if the system waits long enough, a process will become free. If there is sufficient memory, there is no harm in creating a process instantaneous to the initial request. 
 
12. Min Packet Receive Buffers = 1400 (2 per user). 
Any request that is processed uses a Packet Receive Buffer. This includes all NCP requests, SAPs, RIPs, TCP packets, etc. If the server is bombarded with requests and there is not enough packet receive buffers, the system will get bottlenecked and will start dropping requests. The result is loss of connection to users, loss of server to server connections, slowness, etc. Monitor the current packet receive buffers during peak times and make sure that the minimum is set to that current setting so that there enough packet receive buffers at all time. Remember, this also takes up memory. Be sure to have sufficient memory on the server. 
 
13. Max Packet Receive Buffers = 4000 
This protects the server against too many packet receive buffers allocating to much memory to processes. 
 
14. New Packet Receive Buffer Wait Time = .1 Sec 
If the server has sufficient memory, this setting can significantly increase productivity. As with service processes, the server will immediately spawn a new buffer without waiting to see if one becomes available first. 
 
15. No SNMPon the POA, MTA and ADA. 
This feature requires quite a bit of I/O and processor traffic. If SNMP is not being used (through Managewise or some other SNMP server, turning this unneeded feature off could help performance. 
 
16. Turn of LandAttack on the TCPIP stack. 
This feature protects the tcpip stack against LandAttacks. LandAttacks are packets sent to the server with the same source and destination. The packets get into a loop and can bring the server down. If the server in question has no access to the outside world, the chance of a packet doing this is extremely minimal. By turning this unneeded feature off, overhead is reduced and IP packets can be processed faster. 
 
NOTE1: Many of the options above warn against having enough memory. Each additional buffer allocated takes up about 4k. Each service process requires about 16 k. The best way to determine sufficient memory is to watch the LRU count and the Available Cache Buffers. If these numbers drop, LRU below 20 minutes and Cache Buffers below 40%, more memory is probably required. 
 
NOTE2: Tid 2943356 is another great Tid on general server optimization.

RPITERA · Dec 3, 1998

Just wondering...did this help?

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Need to understand the User Connection parameter on POA NLM screen

BobSargent

MIS

RPITERA

MIS

RPITERA

MIS

Similar threads

Part and Inventory Search

Sponsor