Forum OpenACS Development: Re: Possible 10g Solaris Problem

Collapse
Posted by Andrew Piskorski on
Barry, did you try the Tcl exec from your AOLserver without your Oracle driver loaded at all? If it works then, with no other changes, then probably your right that Oracle has something to do with it.

When you do load nsoracle, what if you try a Tcl exec after connecting successfully to Oracle but before doing any database queries at all. Does the exec work then?

Also, which version of AOLserver, Tcl, and nsoracle are you using?

The OTN link you gave above does not seem to work (due to some stupid Oracle session tracking stuff in the URL?), this one seems ok.

The guy posting the problem on OTN was awfully vague. He doesn't even say what version of Oracle he's running, never mind other important info like which of Oracle's three different connect methods he's using. Perhaps someone intimately familiar with OCI could infer what specifically he must have meant, but it all sounds sketchy to me:

I encountered the "defunct" problem when I try to connect oracle database using OCI. The connection is successful but after each connection there's a defunct process left in the system, which leads to the exhaust of system process resources. Any ideas or suggestion?

[...]

This problem has been settled. After each oci operation, the created database thread would send a signal SIGCHLD to the parent process to indicate the child process has finished doing the requested operation. In my former program I didn't do anything with this signal, which at last led to the child process becoming a defunct process, and the total number of which would increase as more and more OCI operations are called. That's why I've got so many defunct processes.

It's easy to settle this problem. You should handle the signal SIGCHLD in your process which performing the data base operations. The simplest way is adding the following lines into the program:

signal(SIGCHLD, SIG_IGN);

That's all. Hope the above information would be helpful to other people.

"created database thread", huh? Since when is there any such thing? Perhaps he means the Oracle connection process, and that (to translate into the OpenACS case), it is now sending SIGCHLD to the AOLserver process.

If so, and this is new behavior on Oracle's part, then some sort of signal handler somewhere in AOLserver probably does need to be changed to deal with this. I couldn't tell you where or how though. Also, maybe there is some way to get Oracle OCI not to generate the signal instead?