Click to See Complete Forum and Search --> : Killing Zombies


Christian Olsson
03-20-2002, 02:39 PM
How do I write a sub function that correctly kills a zombie process with signal();?

signal(SIGCHLD, reaper);

How should reaper() look like? And what is int *status good for? I have looked through the manuals that comes with Debian with man 2 signal, kill, wait, waitpid, etc but it's a bit hard for me since I don't speak English as my main language and haven't gotten so much training with it yet.

edit: I can't solve this with wait(); since I have to have the main process spawn new childs when it's supposed to.

[ 20 March 2002: Message edited by: Christian Olsson ]

bwkaz
03-20-2002, 05:17 PM
SIGCHLD gets sent whenever a child exits, and no other times, correct?

Then your signal handler should probably look like this:

void sighandler(int sig)
{
int status=0;
int childpid;

if(sig == SIGCHLD) {
if((childpid = wait(&status)) < 0) {
printf("Error in wait call, %d\n", errno);
}
else if(child == 0) {
printf("No child has exited, you shouldn't see this unless you're using waitpid() with the WNOHANG parameter above!\n");
}
else {
printf("Child %d exited\n", childpid);
}
}

// Do nothing for other signals; as long as you don't install this handler for them, this handler won't get called for anything else
}

Basically, this boils down to "whenever a child exits, wait() on it".

The int *status is used to return the status of the child that died. You can use the other macros listed on the wait() manpage to evaluate this, but usually it won't matter.

Like in this case, usually WIFEXITED(status) will return true (nonzero), and all the others will return false (0) -- this means that the child exited normally. But it is possible (for example, if someone uses the kill shell command to kill one of the children) that WIFEXITED(status) will be false, but WIFSIGNALED(status) and WIFTERMSIG(status) will be true.

The "beauty" of wait() is that as long as a child has actually exited (and my assumption is that the signal handler won't be called otherwise, that may not be correct but I think it is...), then wait() doesn't block. So you will always get back control, and you don't have to worry about spawning new children, because that will happen, guaranteed.

You also will wait() for new processes, unless you signal(SIGCHLD, SIG_IGN); (which is not allowed according to POSIX and is a VERY BAD IDEA anyway). This is because you get sent a new signal for every child that exits, so since you call wait() every time you get the SIGCHLD, you should be cleaning up all your zombies.

....

Hopefully. If the program seems to be blocking on wait() in the signal handler, then I'd suggest using waitpid() instead, with a pid of -1, the same status, and an option of WNOHANG, like this:

if((child = waitpid(-1, &status, WNOHANG) < 0) {

waitpid() has the same return value semantics as wait(), but the WNOHANG means that it won't hang (block) if no children have exited. You may also want to look into or'ing WUNTRACED with WNOHANG, which makes waitpid() not block when children are stopped (as in paused, I think, not terminated) but who haven't reported a status back to somewhere, probably the kernel.