First I would type
ps aux | more
which will show processes in order of CPU usage a page at a time, with the highest usage at the top of the list. Ignore the kproc at the top (one for each processor) which is the 'soak' process and will use all unused CPU cycles.
You may find something at the top of this list with is a dead giveaway.
If you don't, then it is time to do some more digging, and this is where things get a bit more vague.
Try
vmstat 3 20 which will take a snapshot every 3 seconds for 1 minute (do this while the system is struggling). You will see output like the following... (i've messed with the columns a bit to avoid wrap around.
kthr memory page faults cpu
----- ----------- ---------------- --------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
3 2 129855 11995 0 0 0 0 0 0 555 31203 368 36 19 31 14
5 2 129975 11709 0 0 0 0 0 0 552 17757 445 40 35 16 9
3 3 130079 11344 0 0 0 0 0 0 596 14579 493 36 30 20 14
If you look at the last four columns, us, sy, id and wa, then these represent User, System, Idle, and Waiting as percentages (all 4 columns add up to 100).
If the us + sy columns > 80 consistently, then you are probably suffering from a CPU bottleneck, i.e. your RS6000 just doesn't have the CPU power to handle the workload you are putting on it.
If the wa column is consistently non-zero, this may indicate an I/O bottleneck on disk.
If the id column is consistently high, then this would indicate that your problem is not down to high CPU usage.
Also look at the pi / po columns under the 'page' heading. If these are consistently non-zero, this indicates that your system is low in memory (check the fre column) and is paging heavily, which will seriously impact on performance.
If your wa column indicates a lot of waiting for I/O to complete use this command...
iostat 3 20 (again while the system is struggling) and like vmstat, this will take a snapshot every 3 seconds for a minute, and will give output like this for each snapshot
tty: tin tout avg-cpu:% user % sys % idle % iowait
9.0 327.7 19.7 44.6 23.7 11.9
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk7 0.0 0.0 0.0 0 0
hdisk6 2.7 25.3 4.0 76 0
hdisk5 0.0 0.0 0.0 0 0
hdisk1 17.7 52.0 10.7 0 156
hdisk0 21.3 362.4 16.7 208 880
hdisk3 1.0 10.7 1.7 0 32
hdisk2 5.0 22.6 4.7 16 52
hdisk4 0.0 0.0 0.0 0 0
cd0 0.0 0.0 0.0 0 0
As a general rule of thumb, your system is I/O bound if your %iowait > 25% and %tm_act > 70% on any given disk.
Hopefully this will give you some pointers to start looking. If you try this on your system and post any output here, I will gladly try and troubleshoot your system for you.
Regards, LHLTech
IBM CS - AIX V4.3 System Support
Halfway through CATE exams!