Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SAN storage access problem and system reaction

Status
Not open for further replies.

ogniemi

Technical User
Nov 7, 2003
1,041
PL

Is it common system behaviour that on system having running a DB/application/user sessions, just after storage access is lost/blocked (for a several seconds) the system behaves like a freezed? what VMM usually does in such case with memory pages (computational pages and file memory pages)?

 

sometimes I get errors like:

Code:
LABEL:          TS_NIM_ERROR_STUCK_
IDENTIFIER:     3D32B80D

Date/Time:       Tue Sep 14 20:45:57 CET 2010
Sequence Number: 16966
Machine Id:      000C77037100
Node Id:         aserver
Class:           S
Type:            PERM
Resource Name:   topsvcs

Description
NIM thread blocked

Probable Causes
A thread in a Topology Services Network Interface Module (NIM) process
was blocked
Topology Services NIM process cannot get timely access to CPU

User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

........

Thread which was blocked
receive thread
Interval in seconds during which process was blocked
          60
Interface name
rhdisk4

The system was avaialble all the time. On nmon statistics no abnormal behaviour is seen in cpu, memory or i/o usage. The only error are the SAN DISKS ERRORS in errp and above coming from topsvcs. Why the system is like a freezed during the SAN storage errors and no performance issues are logged in nmon? rootvg is on internal drives.
 
Dont confuse Network Interface Module with NIM, looks like a network bottleneck on first glance.

Can you supply a little more info

AIX ver
Cluster?



Mike

"Whenever I dwell for any length of time on my own shortcomings, they gradually begin to seem mild, harmless, rather engaging little things, not at all like the staring defects in other people's characters."
 

but as you can seen in my first post, problem concerns diskhb network so HB communication over SAN:

Thread which was blocked
receive thread
Interval in seconds during which process was blocked
60
Interface name
rhdisk4

This is PowerHA configuration.
 

I was already changed from default values 0/0 to:

maxpout 513 HIGH water mark for pending write I/Os per file True
minpout 256 LOW water mark for pending write I/Os per file True


I don't remember but probably this value was auto-tuned after some upgrade or due to hardware/disk configuration - this is P6 system and SAS Disk Drives in rootvg.

As far as i know it is hard to tune it very easy (as also written on IBM sites) - in facts and features docu shows that from AIX 6.1 defaults minpout/maxpout are now 4096/8192 (AIX 5.3 and lower have 0/0). Also on mu VIOS 2.2 it is 4096/8192.

I am not sure I can also change to 4096/8192 on my AIX 5.3 systems - I have no idea how to get best for it.

Any idea?

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top