Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Monitor DB Health.

Status
Not open for further replies.

jpt54

MIS
Jan 14, 2003
5
FR
Hi,

I'm trying to build a script which retrieve "health" key ratio from my Informix 7.3 DB.

Here is the info i retrieve, most of them from onstat -p output:

ReadCache_pct_used
WriteCache_pct_used
BufferWait_pct (=BUFWAITS/(PAGEREADS+BUFWRITS))
LockWaitpct (=(LockWaits/Lockrequest) *100)
ReadAheadpct (=(RA-pgsused/(idxa-ra + idx-ra + da-ra)) *100)
PhysBuff_pct_used
LogicalBuff_pct_used

My first question is:

Did i forgot a usefull information to monitor the DB health?
Is there other key ratio that i have to monitor?


The other question is about a weird result i receive with my script:
My Buffer_Wait_pct value is about 170/190 ???
I've try to launch a onstat -z, this ratio has fall to zero and grown up to 180 in less than one day !


Can someone help me?

Thanks!



--------------------------------------------------
Informix Dynamic Server Version 7.31.UC5 -- On-Line (Prim) -- Up 12 days 06:46:08 -- 98304 Kbytes

Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
90711544 5504961 1182913300 92.33 2950967 3818322 27997598 89.46

isamtot open start read write rewrite delete commit rollbk
622102440 5906529 58067206 384760327 7652773 5127270 739642 62575 1713

gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs
0 0 0 0 0 0 0

ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes
0 0 5 7579.80 4444.24 935 2880

bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans
63458950 3 1158986121 0 0 131 126545 169667

ixda-RA idx-RA da-RA RA-pgsused lchwaits
38381840 269651 26250774 64741676 88241

--------------------------------------------

#**************************************************************************
#
# INFORMIX SOFTWARE, INC.
#
# Title: onconfig
# Description: Informix Dynamic Server Configuration Parameters
#
#**************************************************************************

# Root Dbspace Configuration

ROOTNAME rootdbs # Root dbspace name
ROOTPATH /u/informix/rrootdbs # Path for device containing root dbspace
ROOTOFFSET 0 # Offset of root dbspace into device (Kbytes)
ROOTSIZE 50000 # Size of root dbspace (Kbytes)

# Disk Mirroring Configuration Parameters

MIRROR 0 # Mirroring flag (Yes = 1, No = 0)
MIRRORPATH # Path for device containing mirrored root
MIRROROFFSET 0 # Offset into mirrored device (Kbytes)

# Physical Log Configuration

PHYSDBS rootdbs # Location (dbspace) of physical log
PHYSFILE 10000 # Physical log file size (Kbytes)

# Logical Log Configuration

LOGFILES 80 # Number of logical log files
LOGSIZE 5000 # Logical log size (Kbytes)

# Diagnostics

MSGPATH /u/informix/online.log # System message log file path
CONSOLE /dev/null # System console message path
ALARMPROGRAM /u/informix/etc/log_full.sh # Alarm program path
SYSALARMPROGRAM /u/informix/etc/evidence.sh # System Alarm program path
TBLSPACE_STATS 1

# System Archive Tape Device
# EXPLOITATION
TAPEDEV /dev/rStp0 # Tape device path
TAPEBLK 16 # Tape block size (Kbytes)
TAPESIZE 35000000 # Maximum amount of data to put on tape (Kbytes)

# Log Archive Tape Device
# EXPLOITATION
LTAPEDEV /u/informix/log/tapelog
LTAPEBLK 4 # Log tape block size (Kbytes)
LTAPESIZE 2000000 # Max amount of data to put on log tape (Kbytes)

# Optical

STAGEBLOB # Informix Dynamic Server/Optical staging area

# System Configuration

SERVERNUM 1 # Unique id corresponding to a Dynamic Server instance

NETTYPE ipcshm,1,100,NET # Configure poll thread(s) for nettype
NETTYPE tlitcp,1,100,NET # Configure poll thread(s) for nettype
DEADLOCK_TIMEOUT 60 # Max time to wait of lock in distributed env.
RESIDENT 1 # Forced residency flag (Yes = 1, No = 0)

MULTIPROCESSOR 0 # 0 for single-processor, 1 for multi-processor
NUMCPUVPS 1 # Number of user (cpu) vps
SINGLE_CPU_VP 1 # If non-zero, limit number of cpu vps to one

NOAGE 1 # Process aging
AFF_SPROC 0 # Affinity start processor
AFF_NPROCS 0 # Affinity number of processors

# Shared Memory Parameters

LOCKS 1000000 # Maximum number of locks
BUFFERS 5000 # Maximum number of shared buffers
NUMAIOVPS 3 # Number of IO vps
PHYSBUFF 128 # Physical log buffer size (Kbytes)
LOGBUFF 32 # Logical log buffer size (Kbytes)
LOGSMAX 400 # Maximum number of logical log files
CLEANERS 4 # Number of buffer cleaner processes
SHMBASE 0x0 # Shared memory base address
#SHMBASE 0x82000000L # Shared memory base address
SHMVIRTSIZE 16000 # initial virtual shared memory segment size
SHMADD 2048 # Size of new shared memory segments (Kbytes)
SHMTOTAL 0 # Total shared memory (Kbytes). 0=>unlimited
CKPTINTVL 300 # Check point interval (in sec)
LRUS 8 # Number of LRU queues
LRU_MAX_DIRTY 15 # LRU percent dirty begin cleaning limit
LRU_MIN_DIRTY 5 # LRU percent dirty end cleaning limit
LTXHWM 50 # Long transaction high water mark percentage
LTXEHWM 60 # Long transaction high water mark (exclusive)
TXTIMEOUT 0x12c # Transaction timeout (in sec)
STACKSIZE 32 # Stack size (Kbytes)

OFF_RECVRY_THREADS 10 # Default number of offline worker threads
ON_RECVRY_THREADS 1 # Default number of online worker threads

# Data Replication Variables
# DRAUTO: 0 manual, 1 retain type, 2 reverse type
DRAUTO 0 # DR automatic switchover
DRINTERVAL -1 # DR max time between DR buffer flushes (in sec)
DRTIMEOUT 240 # DR network timeout (in sec)
DRLOSTFOUND /u/informix/etc/dr.lostfound # DR lost+found file path

# CDR Variables
CDR_LOGBUFFERS 2048 # size of log reading buffer pool (Kbytes)
CDR_EVALTHREADS 1,2 # evaluator threads (per-cpu-vp,additional)
CDR_DSLOCKWAIT 5 # DS lockwait timeout (seconds)
CDR_QUEUEMEM 4096 # Maximum amount of memory for any CDR queue (Kbytes)

# Backup/Restore variables
BAR_ACT_LOG /tmp/bar_act.log
BAR_MAX_BACKUP 0
BAR_RETRY 1
BAR_NB_XPORT_COUNT 10
BAR_XFER_BUF_SIZE 31

# Informix Storage Manager variables
ISM_DATA_POOL ISMData # If the data pool name is changed, be sure to
# update $INFORMIXDIR/bin/onbar. Change to
# ism_catalog -create_bootstrap -pool <new name>
ISM_LOG_POOL ISMLogs

# Read Ahead Variables
RA_PAGES # Number of pages to attempt to read ahead
RA_THRESHOLD # Number of pages left before next group

DBSPACETEMP tmpdbs # Default temp dbspaces

# DUMP*:
# The following parameters control the type of diagnostics information which
# is preserved when an unanticipated error condition (assertion failure) occurs
# during Dynamic Server operations.
# For DUMPSHMEM, DUMPGCORE and DUMPCORE 1 means Yes, 0 means No.

DUMPDIR /tmp # Preserve diagnostics in this directory
DUMPSHMEM 1 # Dump a copy of shared memory
DUMPGCORE 0 # Dump a core image using 'gcore'
DUMPCORE 0 # Dump a core image (Warning:this aborts Dynamic Server)
DUMPCNT 1 # Number of shared memory or gcore dumps for
# a single user's session

FILLFACTOR 90 # Fill factor for building indexes

# method for Dynamic Server to use when determining current time
USEOSTIME 0 # 0: use internal time(fast), 1: get time from OS(slow)

# Parallel Database Queries (pdq)
MAX_PDQPRIORITY 100 # Maximum allowed pdqpriority
DS_MAX_QUERIES # Maximum number of decision support queries
DS_TOTAL_MEMORY # Decision support memory (Kbytes)
DS_MAX_SCANS 1048576 # Maximum number of decision support scans
DATASKIP off

OPTCOMPIND 0 # To hint the optimizer

ONDBSPACEDOWN 2 # Dbspace down option: 0 = CONTINUE, 1 = ABORT, 2 = WAIT
LBU_PRESERVE 0 # Preserve last log for log backup
OPCACHEMAX 0 # Maximum optical cache size (Kbytes)

# HETERO_COMMIT (Gateway participation in distributed transactions)
# 1 => Heterogeneous Commit is enabled
# 0 (or any other value) => Heterogeneous Commit is disabled
HETERO_COMMIT 0

# Optimization goal: -1 = ALL_ROWS(Default), 0 = FIRST_ROWS
OPT_GOAL -1

# Optimizer DIRECTIVES ON (1/Default) or OFF (0)
DIRECTIVES 1

# Status of restartable restore
RESTARTABLE_RESTORE off
PC_POOLSIZE 200 # Nombre de procedures chargees en memoire=200/2-1=99
CDR_LOGDELTA 30 # % of log space allowed in queue memory
CDR_NUMCONNECT 16 # Expected connections per server
CDR_NIFRETRY 300 # Connection retry (seconds)
CDR_NIFCOMPRESS 0

 
That just means the application has had to wait for buffers - a lot. Looking at your onconfig parameters, the BUFFERS are set pretty low. I would increase them to around 50000 (or higher) and see how your pct changes. For IDS.....lots of buffers are good, if you have the memory
 
Thanks for your ansmers.

I will try to increase the Buffer value.

The &quot;Healthcheck&quot; script is great but i would be really pleased if an informix specialist can give me a ranking of the more usefull ratio to check:
-The ones that you HAVE TO check! (that's what i called &quot;Key Ratio&quot;)
-The ones that it could be usefull to check!
-Finally the ones that could be checked for a full coverage.

Sorry for my English, i hope that it is understandable...


Thanks again!
Jp.
 
One correction to the above, the buffer wait number is an absolute number and is relatively meaningless. What you need to look at is the buffer wait RATIO as follows: ratio = (bufwaits / pagereads + bufwrites) * 100. A ratio above 20% needs to be addressed, between 10% and 20% should be monitored.
 
Ouch, i need more help.

I have raised the buffers value to 50000 (was 5000) and the BR ratio is still very high:

-------------------------------------------------------------------
Informix Dynamic Server Version 7.31.UC5 -- On-Line (Prim) -- Up 4 days 01:39:09 -- 176128 Kbytes

Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
11318557 3006858 109118692 89.63 7238 17597 35929 79.85

isamtot open start read write rewrite delete commit rollbk
51281619 214568 901356 47583817 6066 1279 3522 338 3

gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs
0 0 0 0 0 0 0

ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes
0 0 0 413.62 331.02 12 24

bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans
7661949 0 90524911 0 0 0 3938 1981

ixda-RA idx-RA da-RA RA-pgsused lchwaits
8411641 2520 50767 8464834 696
-------------------------------------------------------------------


(7661949/(3006858 + 35929))*100 = 251.8 !!!!!

Should i add buffers ? What is the problem ?



Thanks.

 
Any idea about my buffer wait problem?

Thanks!


 
Hi,

Bufwaits can be due to the LRU and Read-ahead settings and
to the number of BUFFERS :
so check the LRU with onstat -R and onstat -F and
try to increase a little more the BUFFERS parameter.
Concerning RA-AHEAD, you didn't specify any value in
the $ONCONFIG but your ration is fine.

Regards,

Jean

Visit our site :
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top