Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Socket Processes

Status
Not open for further replies.

costiles

Technical User
Jun 14, 2005
117
US
In our application we use sockets to address a Universe database. The sockets are all owned by one user. We have the number of processes per user set high so as to allow these sockets to be kicked off by one user. Up until we get 50 processes accessing the database everything is fine. Above 50 and the performance starts to really begin to bog down. It appears to be I/O that is causing the problem. Anyone know of any parameters that might be tweaked to allow the processes to get needed resources once the seemingly 50 process limit is reached?
 
check the number of AIO servers!?!?

smitty aio

Try to increase it if it not enough!

I have a good document about AIO let me dig it out and i will let you know about it

Regards,
Khalid
 
Thank you so much for this posting. I have changed the parameter in our test box to 10 maxusers per disk. I also set the minimum to 10. Is there a problem with setting the minimum to 10 - when would the minimum and the maximum go into effect?
 
The default minimum number of servers configured when AIO is enabled is 1 and in most cases it is better to leave it to the default value since the AIO kernel extension will generate additional servers if needed.

It is recommended that maxservers be set to at least 10*(number of disks accessed asynchronously)

Regards,
Khalid
 
Refer to the links above for more info

Code:
How Many AIO Servers Am I Currently Using?
To determine how many POSIX AIO Servers are currently running, type the following information on the command line:

pstat -a | grep posix_aioserver | wc -l 
Note:
You must run this command as the root user.
To determine how many LEGACY AIO Servers are currently running, type the following information on the command line:

pstat -a | egrep ' aioserver' | wc -l  
Note:
You must run this command as the root user.
If the disk drives that are being accessed asynchronously are using either the Journaled File System (JFS) or the Enhanced Journaled File System (JFS2), all I/O is routed through the AIOs kprocs.

If the disk drives that are being accessed asynchronously are using a form of raw logical volume management, then the disk I/O is not routed through the AIOs kprocs. In that case the number of servers that are running is not relevant.

However, if you want to confirm that an application that uses raw logic volumes is taking advantage of AIO, you can disable the fast path option using System Management Interface Tool (SMIT). When this option is disabled, even raw I/O is forced through the AIOs kprocs. At that point, the pstat command listed in the preceding discussion works. Do not run the system with this option disabled for any length of time. The option provides a way to confirm that the application is working with AIO and raw logical volumes.

At releases earlier than AIX 4.3, the fast path is enabled by default and cannot be disabled.

How Many AIO Servers Do I Need?
Here are some suggested rules for determining what value to set maximum number of servers to:

The first rule suggests that you limit the maximum number of servers to a number equal to ten times the number of disks that are to be used concurrently, but not more than 80. The minimum number of servers should be set to half of this maximum number. 
Another rule is to set the maximum number of servers to 80 and leave the minimum number of servers set to the default of 1 and reboot. Monitor the number of additional servers started throughout the course of normal workload. After a 24-hour period of normal activity, set the maximum number of servers to the number of currently running AIOs + 10, and set the minimum number of servers to the number of currently running AIOs - 10. 
Note:
In some environments, you might see more than 80 AIOs kprocs running. If so, consider the third rule that follows.

A third suggestion is to take statistics using vmstat -s before any high I/O activity begins, and again at the end. Check the field iodone. From this you can determine how many physical I/Os are being handled in a given wall clock period. Then increase the maximum number of servers and see if you can get more activity or event completions (iodones) in the same time period.

This is from one of the links (IBM site)

You didn't mention what version of AIX you are using? and are you using oracle in there? just wondering!

How did the system react after you changed those values? Is it better now? If not then there might be some thing else that i might direct you to look at!

Regards,
Khalid
 
thanks again for your valuable post. I will not be able to test thoroughly until Monday. I will let you know then.
 
Khalid,
The database that we are running is Universe - a non multi-threaded environment. Do you think this will help in that environment?
 
Yeah, we just have to troubleshoot this step by step! we just need to identify that the AIO is not the problem by changing the number of AIO! Then we for other things like fsbufs or pbufs!

Are you using jfs or jfs2?

Could you list the output of the following commands before and after applying the load?

vmstat -v

lvmo -a

netstat -m

iostat -A 1 5

Try this command while the load is there as well:

iostat -A 1 5

Regards
Khalid
 
Khalid,
Will run those commands before and after - and the iostat -A during the test. We are sceduling another test for tomorrow - I will post results. Thank you so much for the help!
 
Good Luck.

I will be waiting for your list

Regards,
Khalid
 
Khalid:
This is the stats from a customer running our application with asynchronous I/O not enabled. I don't think I have the kind of asynchronous I/O that you are referring to. Universe is an asynchronous application in that multiple processes can be accessing the same file at the same time.
clt148/ # vmstat -v
4030463 memory pages
3823186 lruable pages
3020 free pages
2 memory pools
726114 pinned pages
80.0 maxpin percentage
35.0 minperm percentage
65.0 maxperm percentage
68.9 numperm percentage
2637508 file pages
0.0 compressed percentage
0 compressed pages
61.6 numclient percentage
65.0 maxclient percentage
2358399 client pages
0 remote pageouts scheduled
368314 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
23729 filesystem I/Os blocked with no fsbuf
0 client filesystem I/Os blocked with no fsbuf
839151 external pager filesystem I/Os blocked with no fsbuf
0 Virtualized Partition Memory Page Faults
0.00 Time resolving virtualized partition memory page faults
clt148/ # lvmo -a
vgname = rootvg
pv_pbuf_count = 512
total_vg_pbufs = 1024
max_vg_pbuf_count = 16384
pervg_blocked_io_count = 0
pv_min_pbuf = 512
global_blocked_io_count = 368314
clt148/ # netstat -m
3294 mbufs in use:
3241 mbuf cluster pages in use
14611 Kbytes allocated to mbufs
0 requests for mbufs denied
0 calls to protocol drain routines
0 sockets not created because sockthresh was reached

Kernel malloc statistics:

******* CPU 0 *******
By size inuse calls failed delayed free hiwat freed
32 45 265 0 0 83 10076 0
64 214 839994 0 3 234 10076 0
128 419 435579 0 9 221 5038 0
256 498 3012505 0 14 366 10076 0
512 304 18037696 0 12 328 12595 0
1024 395 330313 0 122 149 5038 0
2048 172 918795 0 124 370 7557 0
4096 67 237088 0 74 9 2519 0
8192 36 55951 0 76 12 1259 0
16384 0 15959 0 37 9 629 0
32768 0 4878 0 11 5 314 0
65536 1 2162 0 15 6 314 0
131072 2 2 0 0 15 34 0


******* CPU 1 *******
By size inuse calls failed delayed free hiwat freed
32 4 165 0 0 124 10076 0
64 37 628602 0 1 91 10076 0
128 7 43316 0 0 153 5038 0
256 17 458696 0 0 479 10076 0
512 118 5003545 0 0 466 12595 0
1024 13 55703 0 38 143 5038 0
2048 41 642624 0 7 85 7557 0
4096 0 21787 0 42 10 2519 0
8192 0 7218 0 33 6 1259 0
16384 0 2911 0 25 11 629 0
32768 0 833 0 13 7 314 0
65536 0 620 0 9 5 314 0
131072 0 0 0 0 5 16 0


******* CPU 2 *******
By size inuse calls failed delayed free hiwat freed
32 127 303 0 0 129 10076 0
64 225 826467 0 4 95 10076 0
128 87 415065 0 1 297 5038 0
256 97 3057939 0 1 527 10076 0
512 1184 18042854 0 266 968 12595 0
1024 103 322442 0 81 329 5038 0
2048 2225 906138 0 1107 171 7557 0
4096 3 241304 0 70 20 2519 0
8192 9 54836 0 50 27 1259 0
16384 512 15771 0 86 14 629 0
32768 0 5547 0 10 7 314 0
65536 0 2076 0 10 7 314 0
131072 0 0 0 0 121 247 0


******* CPU 3 *******
By size inuse calls failed delayed free hiwat freed
32 1 162 0 0 127 10076 0
64 42 626704 0 1 86 10076 0
128 20 29126 0 0 172 5038 0
256 22 426689 0 0 330 10076 0
512 119 4585418 0 0 361 12595 0
1024 22 44829 0 39 134 5038 0
2048 45 637770 0 20 69 7557 0
4096 0 18268 0 42 7 2519 0
8192 2 6493 0 34 5 1259 0
16384 0 2574 0 21 12 629 0
32768 0 479 0 13 6 314 0
65536 0 539 0 10 7 314 0
131072 0 0 0 0 15 34 0


******* CPU 4 *******
By size inuse calls failed delayed free hiwat freed
32 59 253 0 0 69 10076 0
64 332 852797 0 3 116 10076 0
128 166 592164 0 1 218 5038 0
256 212 3674512 0 1 604 10076 0
512 252 19972314 0 4 444 12595 0
1024 282 399009 0 113 274 5038 0
2048 236 932031 0 114 256 7557 0
4096 3 172952 0 95 28 2519 0
8192 26 60454 0 81 17 1259 0
16384 0 16432 0 41 15 629 0
32768 0 4804 0 19 8 314 0
65536 0 1499 0 14 7 314 0
131072 0 0 0 0 15 37 0


******* CPU 5 *******
By size inuse calls failed delayed free hiwat freed
32 1 166 0 0 127 10076 0
64 36 648076 0 1 92 10076 0
128 15 41780 0 0 145 5038 0
256 6 412650 0 0 330 10076 0
512 122 5178119 0 0 294 12595 0
1024 2 60162 0 34 142 5038 0
2048 41 662213 0 14 83 7557 0
4096 0 24953 0 36 9 2519 0
8192 0 8303 0 31 6 1259 0
16384 0 3276 0 20 12 629 0
32768 0 432 0 12 5 314 0
65536 0 463 0 9 6 314 0
131072 0 0 0 0 9 22 0


******* CPU 6 *******
By size inuse calls failed delayed free hiwat freed
32 26 231 0 0 102 10076 0
64 184 803197 0 5 200 10076 0
128 150 488328 0 2 170 5038 0
256 222 3349864 0 2 370 10076 0
512 1411 18122735 0 403 1845 12595 0
1024 219 362359 0 82 109 5038 0
2048 2221 886166 0 2052 1885 7557 0
4096 163 153016 0 52 19 2519 0
8192 32 55899 0 43 7 1259 0
16384 0 17593 0 64 462 629 0
32768 0 4484 0 4 8 314 0
65536 0 1793 0 5 6 314 0
131072 0 0 0 0 140 280 0


******* CPU 7 *******
By size inuse calls failed delayed free hiwat freed
32 3 137 0 0 125 10076 0
64 23 610403 0 1 105 10076 0
128 19 31372 0 0 141 5038 0
256 10 319019 0 0 310 10076 0
512 125 4377798 0 0 355 12595 0
1024 1 43252 0 33 135 5038 0
2048 28 619776 0 5 104 7557 0
4096 0 16648 0 47 8 2519 0
8192 0 6744 0 38 6 1259 0
16384 0 2613 0 22 14 629 0
32768 0 461 0 10 5 314 0
65536 0 463 0 9 7 314 0
131072 0 0 0 0 5 16 0

By type inuse calls failed delayed memuse memmax mapb
mbuf 3294 91727603 0 668 1686528 2714112 8
mcluster 3241 4723456 0 4152 7005184 11141120 43
socket 1512 11342685 0 205 1478144 2426112 0
pcb 115 38065 0 0 176000 220672 0
routetbl 50 1387 0 0 12288 13920 0
ifaddr 23 23 0 2 3168 3168 0
mblk 296 14544155 0 0 75776 226048 0
mblkdata 742 2433993 0 52 1237440 1391872 0
strhead 446 3149 0 5 235904 269632 0
strqueue 542 7297 0 79 555008 638976 0
strmodsw 23 23 0 2 2944 2944 0
strpoll 0 5 0 0 0 32 0
strosr 0 44929 0 0 0 3072 0
strsyncq 549 22073 0 4 139360 160480 0
streams 897 8844 0 4 242368 274048 0
devbuf 1538 2050 0 640 10486304 18874912 126
kernel tablemoun 289 116332 0 233 762464 1696352 2
spec buf 1 1 0 0 128 128 0
locking 90 90 0 6 23040 23040 0
temp 212 4998 0 1 71296 79360 7
mcast opts 0 4 0 0 0 64 0
mcast addrs 3 3 0 0 192 192 0

Streams mblk statistic failures:
0 high priority mblk failures
0 medium priority mblk failures
0 low priority mblk failures
 
Khalid - it is jfs2 - I cannot control the timing of the test on our own box. It was suppose to occur today - but has not so far. The above is taken from a customer who is running a similarly configured box.
 
By saying your application is "an asynchronous application" then it might clearly benefit from Asynchronous IO!

Never mind, from the output listed above! I can't really say much about it! It would be better if you give me the same output on different times at least we can compare the values!

Let me begin with the vmstat output:

Memory-wise i don't see any problems going around by that time viewing these values:
Code:
35.0 minperm percentage
65.0 maxperm percentage
68.9 numperm percentage
61.6 numclient percentage
65.0 maxclient percentage

By going to the disks and filesystem values (I can see that, as you mentioned using jfs2, there is no worry about the filesystem bufferes but there is some disks pending I/O - It might be normal but we can distingush this value with the other vmstat values later on, If it increased then you might need to increase the pbufs number)
Code:
368314 pending disk I/Os blocked with no pbuf
23729 filesystem I/Os blocked with no fsbuf
0 client filesystem I/Os blocked with no fsbuf

Same output is shown as the above for lvmo -a (global_blocked_io_count)

From the netstat -m output (It seems that the network adapters are handling the socket requests very well
Code:
0 requests for mbufs denied
0 sockets not created because sockthresh was reached

Too many socket calls but non delayed!
Code:
socket             1512  11342685      0     205 1478144 2426112     0

I will still be waiting for the rest of the outputs and the iostat as well!

Good luck

Regards,
Khalid
 
I meant to say non of the sockets failed! (some delayed though but not much to worry about)

I still think that you might benefit from AIO (It is not application-specific, it is AIX related thing! so you have to set it in AIX not in the Application!)

Regards,
Khalid
 
I set about 20 different vmo, ioo, no, nfso options, along with using cio or dio as the filesystem will warrant for databases.

Your minperm and maxperm settings are strange for AIX 5.3; with lru_file_repage you should only need to set minperm% to a low number based on the amount of physical memory.

But the most glaring thing is to change the pbufs because of pending I/O. I don't know your environment, but because you mention over 50 and you notice wait, then it is a reasonalbe assumption looking at the blocked/pending I/O.

Run a 'vmstat -I 1' and watch the 'p' column as well as the wait column when the processes go over 50. Most likely the number in the p column will not be zero and the wait column will have a value too.

You can check if it is AIO by running 'pstat -a | grep aios | wc -l' and see how many servers are running and compare with how many you have set. Adjust if needed, but not likely your problem.

Since UniVerse isn't multithreaded, you might want to turn off SMT if it is running. SMT is available on POWER5 but not POWER4. 'smtctl' controls SMT.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top