Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

High Scan Rate

Status
Not open for further replies.

kozlow

MIS
Mar 3, 2003
326
US
We are seeing very high scan rates (time to find open memory) on our SUN 5.8 server.

The (b)lock s(w)ap and (r)un queue are all normal (close to or zero).

I run the prstat -s rss and it show oracle (local = no) as the top memory users... This is a Oracle FinApps server, so that is to be expected. This server has been running for 5 years now without any problems. I checked the system file and it has not changed in over a year....

Oracle FinApps has not been changed either (as far as we can tell). FinApps is at an older version and we are looking to replace so I kind of beleive it is static....

One day this week, believe it or not, the scan rate was between 400,000 and 700,000. Block was between 40 and 60. The swap disk drive was at 100% busy..... When we brought down FinApps, it cleared... Once restarted... Block is between 0 and 3, but the scan rates still reach as high as 11,000 and the next second... back to zero.....

????????Any thoughts or comments???????
 
All I can suggest is
Your user usage pattern might have changed, so they are running something which cause high memory usage.

Any problems with RAM? Lost any physical memory.

I would suggest you try to find out if the high scan rate is time related.

It does sound like your system is I/O bound due to the swapping.

you should check the swap usage (iostat,sar -d), and look at memory usage using pmap (expecially your oracle processes),

What does your vmstat -p suggest?

BTW, what do you mean my system file? Are you sure the Orcale config has not been changed?
 
/etc/system

The server is Oracle FinApps which is used all around the country 7-24, and I see the rate high all day long.

We opened a call with Oracle and they have been reviewing.

Since they pin memory and then use shared memory.... Wonder if we should increase the pools rather the decrease...

Thanks for the response...
 
Can you post some vmstat output for one of the bad times?

How much memory is in the system? What is SHMMAX set to? How big is the Oracle SGA?

Annihilannic.
 
Only 4Gig of memory on this SunFire 480R.

set shmsys: shminfo_shmmax = 4294967295

I created a little script to pull b r w and sr out of vmstat. At the end is a run of vmstat 10 10.

Scan Rate = 209, block in I/O = 0, Swap in Process = 0, Run Queue = 0
Scan Rate = 162, block in I/O = 0, Swap in Process = 0, Run Queue = 0
Scan Rate = 314, block in I/O = 0, Swap in Process = 0, Run Queue = 0
Scan Rate = 327, block in I/O = 0, Swap in Process = 0, Run Queue = 0
Scan Rate = 599, block in I/O = 0, Swap in Process = 0, Run Queue = 0
Scan Rate = 118, block in I/O = 0, Swap in Process = 0, Run Queue = 0
Scan Rate = 83, block in I/O = 0, Swap in Process = 0, Run Queue = 0
Scan Rate = 634, block in I/O = 0, Swap in Process = 0, Run Queue = 0
Scan Rate = 1029, block in I/O = 0, Swap in Process = 0, Run Queue = 0
Scan Rate = 503, block in I/O = 0, Swap in Process = 0, Run Queue = 0
Scan Rate = 763, block in I/O = 0, Swap in Process = 0, Run Queue = 0
Scan Rate = 1784, block in I/O = 1, Swap in Process = 0, Run Queue = 0
Scan Rate = 1073, block in I/O = 1, Swap in Process = 0, Run Queue = 0
Scan Rate = 1072, block in I/O = 2, Swap in Process = 0, Run Queue = 0
Scan Rate = 3014, block in I/O = 2, Swap in Process = 0, Run Queue = 1
Scan Rate = , block in I/O = memory, Swap in Process = page, Run Queue = procs
Scan Rate = sr, block in I/O = b, Swap in Process = w, Run Queue = r
Scan Rate = 1562, block in I/O = 1, Swap in Process = 0, Run Queue = 0
Scan Rate = 1307, block in I/O = 2, Swap in Process = 0, Run Queue = 0
Scan Rate = 10123, block in I/O = 2, Swap in Process = 0, Run Queue = 0
Scan Rate = 2565, block in I/O = 1, Swap in Process = 0, Run Queue = 0
Scan Rate = 1391, block in I/O = 1, Swap in Process = 0, Run Queue = 0
(alpha10) /cit/scripts # vmstat 10 10
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s3 s4 s6 in sy cs us sy id
0 0 0 1728144 182608 233 166 2026 214 431 720 3149 0 21 0 0 601 2178 852 15 4 81
0 0 0 812664 61904 175 294 1892 484 1211 43920 1410 0 0 0 0 966 3637 1563 41 5 55
0 0 0 808704 63960 190 67 3709 30 28 15336 0 0 0 0 0 991 2478 1382 23 4 73
0 0 0 804984 62656 723 218 7265 606 1290 39608 1008 0 20 0 0 1334 3191 1678 32 10 57
0 0 0 802200 62952 27 59 137 261 721 13824 513 0 1 1 0 569 2893 1170 37 4 59
0 0 0 799280 64488 18 69 38 93 92 4840 0 0 0 0 0 538 2604 955 28 2 70
0 0 0 799216 63048 160 35 1610 31 56 1712 21 0 0 0 0 693 2446 1092 28 4 68
0 0 0 799240 62800 168 64 1381 53 92 104 76 0 0 0 0 646 2817 1099 30 7 63
0 0 0 792960 63936 868 1546 7432 1079 2448 62728 1563 0 23 0 0 1334 6932 1862 44 14 42
0 0 0 791488 66744 287 1174 1612 420 741 41168 406 0 7 0 0 895 6915 1615 50 10 40

 
Unfortunately that vmstat output seems to be from a quiet time so it's difficult to draw any conclusions from it... if you happen to capture some when it's busy it would be interesting.

Annihilannic.
 
I was told by Sun Support that the scan rate should be Zero. Any other result means a memory problem...

I see scan rates in the vmstat from 0 to over 1000. I always discard the first line of vmstat, but that shows a scan rate of 3149....

My script shows scan rates from a low of 83 to a high of 10123. The script runs a vmstat and I read in the results and display the scan rate to make it easier to read..

Thanks for the help, but I still am not sure why I am seeing such high scan rates.

Can you look at your systems and let me know if any have a scan rate over zero?

 
Unfortunately the load on our systems has been decreasing lately as we move to Linux/Oracle 10g for most applications, but a scan rate of ~1000 was quite normal on some of the larger database servers. Yes, a scan rate of zero is ideal, but 1000 or so wouldn't worry me... a consistent rate of 10000 would be troubling though.

But the most important thing for performance analysis is the end result... are users complaining that the system is slow? Are jobs taking too long, or much longer than they used to?

Also, if this is Oracle Apps, do you run forms on another server or on the same server?

Annihilannic.
 
The scan rate should be in relation to the freed rate. The number of pages scanned to the number of pages freed. Say you have 10000 sr but only 1 fr that would need to be corrected, but if you have a 5000 sr and a 2000 fr then you are within reasonable operating performance (given other factors). You have nothing in the run queue or blocked, but you are doing constant page ins and page outs but have quite a bit of free pages. Your context switches are a little high too, because they are consistently over 1000.

You just need to tune it so that it doesn't do so much repage faults, where the page has previously existed on disk and the vmm looks up the disk address and copies it from paging space into ram (page in). The page out occurs when it needs to be removed from ram temporarily, and the page is copied to paging space if the copy in ram has been modified since it was paged in. A file system is backing store for persistent (file) pages; files page to and from file systems, and everything else pages to paging space.

That is where you need to focus your attention in my opinion.
 
I don't think the cs are too bad really, but it depends on the number of CPUs, etc.

Here is kozlow's vmstat output formatted neatly (I hate vmstats default output format with a passion):

[tt]-- procs -- ---- memory ----- ----------------- page ------------------ ---- faults ----- --- cpu ---
r b w swap free re mf pi po fr de sr in sy cs us sy id
0 0 0 812664 61904 175 294 1892 484 1211 43920 1410 966 3637 1563 41 5 55
0 0 0 808704 63960 190 67 3709 30 28 15336 0 991 2478 1382 23 4 73
0 0 0 804984 62656 723 218 7265 606 1290 39608 1008 1334 3191 1678 32 10 57
0 0 0 802200 62952 27 59 137 261 721 13824 513 569 2893 1170 37 4 59
0 0 0 799280 64488 18 69 38 93 92 4840 0 538 2604 955 28 2 70
0 0 0 799216 63048 160 35 1610 31 56 1712 21 693 2446 1092 28 4 68
0 0 0 799240 62800 168 64 1381 53 92 104 76 646 2817 1099 30 7 63
0 0 0 792960 63936 868 1546 7432 1079 2448 62728 1563 1334 6932 1862 44 14 42
0 0 0 791488 66744 287 1174 1612 420 741 41168 406 895 6915 1615 50 10 40
[/tt]

For comparison, here is a sample from our Solaris 8, Sun Enterprise 5500, 10GB RAM, 12 x 400MHz US-II CPUs at a relatively quiet time. It is also running Oracle Financials, however the forms server is separate:

[tt]-- procs -- ---- memory ----- ----------------- page ------------------ ---- faults ----- --- cpu ---
r b w swap free re mf pi po fr de sr in sy cs us sy id
0 1 0 7573848 6419872 332 1044 6704 1 1 0 0 2261 9360 1394 26 4 70
0 0 0 7569952 6419176 577 2425 4460 12 11 0 0 2146 15894 1957 31 7 63
0 0 0 7572368 6425160 332 2264 4915 1 0 0 0 2101 13242 1752 27 7 67
0 0 0 7575816 6423696 456 1778 3643 19 19 0 0 2010 11812 1584 25 5 69
0 1 0 7575416 6420240 570 996 9828 0 0 0 0 2222 5976 1594 17 4 78
0 0 0 7576112 6419008 605 2208 4784 11 11 0 0 2025 8504 1787 16 6 78
0 1 0 7582856 6426336 337 1907 5451 11 11 0 0 2159 10637 1536 20 6 74
0 1 0 7581152 6422344 302 984 4291 6 6 0 0 2000 6921 1482 16 4 81
0 1 0 7582216 6418800 631 959 1744 89 19 0 0 1926 10306 1197 15 5 80[/tt]

Annihilannic.
 
Thanks for the ideas.... Will take it from here....

Since I only keep vmstat output for a week. (I over write Monday on Monday, Tuesday on Tuesday etc etc). It may be that we have had these scan rates for a while now.

The daily report that I keep only reports the user, system, and idle Utilization of the vmstat and I have been keeping those for over a year now.... May need to add additional fields....

 
Have a good look through /var/adm/messages
Agree that sr=zero is ideal but I wouldn't worry unless excessive as with your >40,000 figure
I also see that you are down to only +60Mb free memory out of 4Gb, that doesn't leave a lot when hitting a busy time.
Oh don't forget the 1st line of a scan is an average, so ignore.

Please post again if/when reoccurs.
 
Nice formatting, Annihilannic, makes it easier to read :)

I don't know the pagesize, but as I mentioned before, there are quite a lot of free pages. Plus it is doing a lot of page ins meaning the vmm is reading it in from disk because it has previously existed. With Oracle you don't want to read from disk so you should maximize your memory at 100% or as close to it as you can so Oracle doesn't have to keep paging in. Your fr to sr ratio isn't over 1:4 so it isn't anything I would look at.

If you have compute-bound processes you will show a cs rate of whatever the time slice is. In AIX the time slice is 1/100th of a second so you usually see 100 is cs. Greater than the time slice means that processes are voluntarily relinquishing the CPU by performing a blocking io system call or a blocking ipc call.

A vmstat -s will show the file system paging which may be helpful.
 
kHz,

Here is the script I use called neatvmstat, if you are interested...

Code:
#!/bin/ksh

UNAME=$(uname)

case ${UNAME} in
        SunOS)
                # procs     memory            page            disk          faults      cpu
                # r b w   swap  free  si  so pi po fr de sr s0 s1 s1 s1   in   sy   cs us sy id
                # 0 0 0  47400 40240   0   0 794 10 244 0 49 3  0  3  0  473 4843  728  9 15 76
                # 0 0 0 5509480 124760 0   0 864 0 718 0 226 1  0  0  0  652 11845 328  5  5 91

                vmstat $* | nawk '
                BEGIN { FIRST=1 }
                /^ p/ {
                        system("echo `date`: load average: `uptime | sed -e \"s/^.*average: //\"`");
                        printf("-- procs --  ---- memory -----  ----------------- page ------------------  ---- faults -----  --- cpu ---\n");
                }
                /^ r/ {
                        printf("%3s %3s %3s  %8s %8s  %5s %5s %5s %5s %5s %5s %5s  %5s %5s %5s  %3s %3s %3s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $17, $18, $19, $20, $21, $22);
                }
                /^ [0-9]/ && ! FIRST {
                        printf("%3d %3d %3d  %8d %8d  %5d %5d %5d %5d %5d %5d %5d  %5d %5d %5d  %3d %3d %3d\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $17, $18, $19, $20, $21, $22);
                }
                /^ [0-9]/ && FIRST { FIRST=0 }
                '
                ;;

        Linux)
                # Red Hat Linux Advanced Server release 2.1AS (Pensacola)
                #   procs                      memory    swap          io     system         cpu
                # r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
                # 3  0  0 1088940   5112  11524 128736   2   2     4     1    3     3   1   3   1
                # 1  0  0 1088940   5528  11564 128136   0   0     1    46  130   407   2   4  95
                # 0  0  0 1088940   5512  11596 128180   0   0     1    30  137   431  18   1  81

                # Red Hat Enterprise Linux AS release 3 (Taroon Update 2)
                #procs                      memory      swap          io     system         cpu
                # r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
                # 0  0      0 31830364 120744 584640    0    0     6     8   30    98  0  0 98  1
                # 0  0      0 31830356 120752 584640    0    0     8     9  176   109  0  0 100  0

                vmstat $* | awk '
                BEGIN { FIRST=1 }
                /^   p/ {
                        system("echo `date`: load average: `uptime | sed -e \"s/^.*average: //\"`");
                        printf("--- procs ----  ------------- memory --------------  --- swap --  --- io ----  - system --  --- cpu ---\n");
                        RELEASE="2.1AS"
                }
                /^p/ {
                        system("echo `date`: load average: `uptime | sed -e \"s/^.*average: //\"`");
                        printf("- procs -  ------------- memory -------------  --- swap --  --- io ----  - system --  ----- cpu -----\n");
                        RELEASE="3.0AS"
                }
                /^ r/ && RELEASE=="2.1AS" {
                        printf("%4s %4s %4s  %8s %8s %8s %8s  %5s %5s  %5s %5s  %5s %5s  %3s %3s %3s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16);
                }
                /^ r/ && RELEASE=="3.0AS" {
                        printf("%4s %4s  %7s %8s %8s %8s  %5s %5s  %5s %5s  %5s %5s  %3s %3s %3s %3s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17);
                }
                /^ [0-9]/ && ! FIRST && RELEASE=="2.1AS" {
                        printf("%4d %4d %4d  %8d %8d %8d %8d  %5d %5d  %5d %5d  %5d %5d  %3d %3d %3d\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16);
                }
                /^ [0-9]/ && ! FIRST && RELEASE=="3.0AS" {
                        printf("%4d %4d  %7d %8d %8d %8d  %5d %5d  %5d %5d  %5d %5d  %3d %3d %3d %3d\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17);
                }
                /^ [0-9]/ && FIRST { FIRST=0 }
                '
                ;;
        *)
                echo Sorry, unknown operating system: ${UNAME}
                ;;
esac

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top