Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Why Aren't Defunct Telnet Sessions Going Away?

Status
Not open for further replies.

dpanattoni

IS-IT--Management
Jan 29, 2002
76
US
From time to time, a user will exit their telnet session without correctly logging out. If I do a ps command, I can still see this session, even days later. If I do a top command, this defunct process(es) takes up all of the CPU time.
Is there a configuration setting somewhere that I can administer that will dictate that these defunct processes should die?
I am using Redhat Advanced Server 2.1.

Thanks in advance.
DP
 
Hi,
You can pipe the output of your ps command to 'cut' the process numbers then pipe them to xargs kill -9. You could also set this up in cron to run as often as you want.
 
Thanks for the thought. I didn't want to do that unless I had to. (A little nervous of having a script killing processes automatically)

Is that my only option? Does RedHat not take care of these defunct processes automatically like other UNIX variants?
 
Because top shows that these defunct processes are using up all of the CPU time, leaving idle as 0%
 
I think there is something wrong with your Linux build if defunct processes are taking up CPU. May need to rebuild.
 
What change would I make to the kernel before rebuilding? I have more than 1 RedHat server, and all of them are exhibiting this same situation. Unless I make a change, won't the rebuild just rebuild what I already have?

A point worth mentioning though is:
During this situation, while the CPU idle % is 0, the system response seems to be uneffected. However, the load average goes through the roof. Normally, this system will have a load average of < 1.0. When there are defunct processes on the system, the load average will climb to at least 30.0.
 

What is your kernel version??

Defuncts shouldn't be a big problem and they cannot use the CPU. If they do there is something wrong with the kernel.

I know that most Unix systems should take care of zombies but I have yet to experience a system that actually does this efficiently.

Cheers Henrik Morsing
IBM Certified AIX 4.3 Systems Administration
 
&quot;Defuncts shouldn't be a big problem and they cannot use the CPU. If they do there is something wrong with the kernel.&quot;

A zombie should not consume overmuch ktime this is true.
However a process/thread that is not being terminated correctly,that remains running, these can consume LOTs of resources.
What happens with zombies is that the application parent did not handle an abend correctly by using waitpid, et.al..

The defunct process is then left without a pid till it is inherited by another, usually the first or init process.

&quot;I know that most Unix systems should take care of zombies but I have yet to experience a system that actually does this efficiently.&quot;

This is an application programming error, not a kernel problem.
The kernel cannot be expected to stick it's nose
into every userspace program and do an additional wait()
on process children. It simply doesn't have the
information it needs to do so.
The facility for closing a process properly is
available to the programmer.
Many times however threaded or complicated or
miswritten/ misconfigured programs exit irregularly and cause serious issues.

 
I appreciate all of the ideas and am looking into each of them:

Regarding the last thought of this being an application problem not handling properly the exiting of a program when a user exits incorrectly, can you elaborate?

If I log on as a userA via a telnet session and run Test_pgm and then disconnect the telnet session by closing down the terminal emulation software, this is what is shown from the ps command:

UID PID PPID C STIME TTY TIME CMD
userA 8505 1 0 14:00 ? 00:00:00 login -- userA
userA 8506 8505 0 14:00 ? 00:00:00 -bash
userA 8540 8506 87 14:00 ? 00:00:42 Test_pgm

The Test_pgm has parent processes all the way back up to the original login, but clearly, the login now no longer has a tty associated with it. Based on what you were saying, could this still be an application problem? The Test_pgm is just a simple 'C' program that asks the user to enter a character and then exits.

I am coming from the SCO UNIX world and have not had this problem before.

The kernel version that I am using is 2.4.9.

Thanks again for all of your comments.
 
Oh, one more thing. These defunct processes aren't being listed by top as being a zombie process. (I think this is correct since technically none of these processes have lost their parent)
 
This makes more sense now. If your test program artifically keeps the fd open, does no checks and is not aware of it's
environment it could well keep the session open.

Yes, this type of operation looks like it is the problem.
A sighandler or ipc mechanism as simple as a fifo could
alleviate this issue.
 
OK. Slow down please.

&quot;This makes more sense now. If your test program artifically keeps the fd open, does no checks and is not aware of it's environment it could well keep the session open.&quot;

What is fd?
What type of checks?

&quot;A sighandler or ipc mechanism as simple as a fifo could
alleviate this issue.&quot;

Can you elaborate? I don't know what you're talking about.
 

marsd,
I didn't say it was a kernel problem but the documentation for the AIX and Linux kernel that I have read says that the kernel will remove zombies after a while. I know that it's a programming problem.

Cheers Henrik Morsing
IBM Certified AIX 4.3 Systems Administration
 
fd = file descriptor.

What kind of checks? Well for starters possibly
timing out and cleaning up all allocated resources
or using even the ancient signal(man signal), to
catch those sigTERM ,etc, and cleaning up responsibly.

It's easier to explain this way:
Say I have a program that opens stdin, gets input
does some other stuff, and does it all again until a certain condition is met, but does no signal handling
or anything a more mature program should do.

int main(void) {
int y = 0,p;
FILE *efile;
char buf[50];

if ( (efile = fopen(&quot;/logfilepath&quot;,&quot;r&quot;)) != NULL) {
while (1) {
y++;
fprintf(stdout,&quot;%s\n&quot;, &quot;File entry&quot;);
fflush(stdout);
fgets(buf,50,stdin);
p = do_commit_query(efile,buf);
if (p == 2) {
fclose(efile);
exit(1);
} else if (p == 1) {
fclose(efile);
return 0;
}
}
}
perror(&quot;fopen()&quot;);
exit(1);
}

What happens when I run this through telnet and then
exit the controlling session, say by closing the xwindow
I was running it in, or by sending ctrl-c to telnet?

This kind of thing invokes undefined behavior. Will
telnet be able to cleanup all processes when they don't
relinquish the tty? Possibly, but that's not it's real
function. What will the kernel do with the login and
shell associated processes if telnet exits cleanly
and the program does not? We can see that in your
example I think.

&quot;Can you elaborate? I don't know what you're talking
about.&quot;

If you are not the program writer and are unable to
fix the code, or are unable to wrap the whole session
in some sort of wrapper , then
you could possibly follow mrregan's advice.

Otherwise you just have a misbehaving c program that
does bad things and leaves you to clean up.
This is not unheard of believe me ;)..
 
Thank you marsd,

What you said makes sense, and you have given me some areas of interest to look up.

I have been programming professionally for the past 15 years and am the programmer of the application programs in question. I have never had to deal with this type of problem in the other UNIX variants that I've used and therefore never knew it existed. I have dealt with at least three other variants and when a telnet session is abruptly ended, those systems somehow all are able to close down the processes related to that tty.

As a side note, since reading your post, I have yet to find a system program and/or system utility that does exit properly when the telnet session is abruptly terminated without first properly shutting down the program or utility. (For example, telnet into a RedHat system, run the setup configuration program (setup), and then close the telnet session without first exiting setup. Telnet back to your system and you will see that there is still a session running the setup program and that there is no tty associated with it.)

Because it would seem that this problem is so common, my only option seems to be to write a script to kill those login sessions with no tty attached.

Once again, Thanks for your help and ideas.
DP
 
It's interesting to me also that this program does not
exit cleanly and I cannot replicate this kind of
behavior on my own machine except by doing some counter
intuitive things.
Unfortunately users do counterintuitive things all the
time.

Good Luck.
 
Just an update since you've spent a lot of time on this with me.

I am not able to capture a SIGTERM signal when a telnet session is abnormally logged off. I am beginning to think that this is a RedHat problem ... not sending a terminate signal when it is supposed to.
 
I would go to usenets comp.unix.programmer on this problem and get some expert opinions on the issue.
I'm sure I at least am missing something very basic about
the problem here.
 
Thanks for all of your help. I'll post back whatever I find.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top