Hey Gang,
I recently re-built our SunBlade 100s with the latest patch clusters and now the systems are mysteriously locking up. It is inconsistent at best and usually takes from 24-72 hours of idle time for the systems to lock, however there have been limited instances where the systems have locked in much less time and with users actively on the systems.
I did not have this problem until I updated the patches...however there are other reasons why I would like to stay at the current patch level. The Sun engineers are stumped. I am hoping you might be able to help me with two things ...
1. Any advice from those of you who may have had similar problems.
2. A way to use the monitoring tools to at least get an idea of what is happening on the system when it is dying.
I cannot get a core file as this is a hard hang, and all of the general performance data (mem, Vmem, I/O, network, ETC.) looks normal...or at least offers no consistent abnormalities at the time of the hangs.
At this point I would love to get a tool, script, whatever which would allow me to log every process which touches the kernel...if for no other reason than to determine the last 100 processes running immediately before the hang occurs. At least this might tell me what apps/patches are involved with the system hang.
I am especially suspicious of the windowing environment (we are using xwindows…NOT CDE) as it has some consistent errors appearing in the log. They are …
1) Workspace Manager: I/O error on display:: :0.0
2) ./.dt/errorlog>>>dtsession: Connection to server lost - exiting.
3) Ramtek Error: Parse_Tplot: CASE_LARGE_CHARS: Unsupported.
Thanks in advance for the help!!
Mike
I recently re-built our SunBlade 100s with the latest patch clusters and now the systems are mysteriously locking up. It is inconsistent at best and usually takes from 24-72 hours of idle time for the systems to lock, however there have been limited instances where the systems have locked in much less time and with users actively on the systems.
I did not have this problem until I updated the patches...however there are other reasons why I would like to stay at the current patch level. The Sun engineers are stumped. I am hoping you might be able to help me with two things ...
1. Any advice from those of you who may have had similar problems.
2. A way to use the monitoring tools to at least get an idea of what is happening on the system when it is dying.
I cannot get a core file as this is a hard hang, and all of the general performance data (mem, Vmem, I/O, network, ETC.) looks normal...or at least offers no consistent abnormalities at the time of the hangs.
At this point I would love to get a tool, script, whatever which would allow me to log every process which touches the kernel...if for no other reason than to determine the last 100 processes running immediately before the hang occurs. At least this might tell me what apps/patches are involved with the system hang.
I am especially suspicious of the windowing environment (we are using xwindows…NOT CDE) as it has some consistent errors appearing in the log. They are …
1) Workspace Manager: I/O error on display:: :0.0
2) ./.dt/errorlog>>>dtsession: Connection to server lost - exiting.
3) Ramtek Error: Parse_Tplot: CASE_LARGE_CHARS: Unsupported.
Thanks in advance for the help!!
Mike