Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Few questions about this incident

Status
Not open for further replies.

dman7777777

IS-IT--Management
Jan 13, 2007
52
US
At my work, there were a few processes that hung up. We tried to do a kill -9 and the process id numbers and they still wouldn't die. So, we rebooted. At this point, this system would boot up but not fully. We were able to ping the box but we were not able to putty into it. There was no console ether for an engineer to get. Finally, we found an engineer that got into the box remotely. He discovered corrupted file systems and had to wipe them out. I was wondering....

1) Was there a another way to of killed off these processes that they would of responded and died?
2) After rebooting the box, we could ping it but not putty into it. Since we were able to ping it(successfully), does this mean that at least the Basic Operating System was loaded?
3) HOW was that engineer able to get into the box remotely since we couldn't putty into it?

 
1) No. "kill -9" is the hand of god when it comes to terminating a process. If a process doesn't die from that, then you are having kernel problems.

2) Not necessarily. There isn't enough information provided to make a judgement. It is possible for the IP stack to come up (and therefore respond to ping), but not necessarily for inetd or sshd to come up, and you didn't say whether you were using telnet or ssh to connect.

3) You'll have to ask your engineer. My money would be on a modem attached to a serial port, so he'd be getting in via a tty instead of IP.
 
1) Hmm...interesting. Would this be a problem with the AIX kernel or more like the company software written was not written good and therefore does not work well with the kernel?

2) we tried both telnet and ssh.

3) Interesting. I'll never know for sure because I'll never be able to ask him. I work the grave yard and it was like 4 in the morning when he had to fix the box. He was steaming mad...but that goes along with the job. I wish I could have that much knowledge and be able to do what he did. Even our other AIX engineer didn't know how to get into the box without a putty session or console.
 
The AIX kernel is pretty solid, but there are a few things that I have seen where things get abused to the point that it has to be rebooted.
 

Chapter11 said:
"
1) No. "kill -9" is the hand of god when it comes to terminating a process. If a process doesn't die from that, then you are having kernel problems.
"

That is completely not true! The kernel is doing the right thing as the process has been left to hang by it's parent. The process has text and data segments associated with it that the parent should have cleaned up.

If the process isn't responding to kill -9 blame the software that did it and log a fault with the company you bought it from.

As for the corrupted filesystems it's unlikely that the process caused it, probably more likely that something hung because the file system got corrupted.
 

Sorry, just to add something else, processes waiting for I/O are non-interuptable, as on any common Unix system, and cannot be killed. It is actually more likely that you had an I/O disk problem that caused the file system to be corrupted and also the procces to hang in I/O wait.
 
The software wasn't waiting for any input from the user. Its more of an automation software that kicks off at a certain time and runs reports, ect. Would a corrupted file system cause this software to hang or would the software itself cause corrupted file system?
 

I never mentioned user input... Yes, corrupted file system might cause it to hang or an I/O might have caused both the file system corruption and the application hanging
 
Was there anything in the errpt report that would indicate a problem in regard to this issue? I would have checked!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top