Forum OpenACS Q&A: OpenACS and Database Locks - Conflict?
I have this weird something and couldn't reach to a reasonable explanation.
First of all, I run OpenACS 5 (W .LRN) Over RH AS2.1/Oracle220.127.116.11
I have another back-end of Oracle Applications connected to the Same Web Database (A fertile environment for lock conflict)
Anyhow, Sometimes the Web Server hangs vaguely (for a while only) then it returns normal.
I suspected that a lock somehow (from the other back-end) is preventing the server from doing something... (Bu, why the whole server!!)
Here are some notices:
The server log when in hang status
[23/Mar/2004:07:51:35][16469.1228820][-sched:idle3-] Notice: starting
[23/Mar/2004:07:51:36][16469.1245205][-sched:idle4-] Notice: starting
[23/Mar/2004:07:51:37][16469.1261590][-sched:idle5-] Notice: starting
[23/Mar/2004:07:52:50][16469.1277975][-sched:idle6-] Notice: starting
[23/Mar/2004:08:27:50][16469.32771][-sched-] Warning: sched: excessive time taken by proc 6 (9 seconds)
[23/Mar/2004:08:28:01][16469.32771][-sched-] Warning: sched: excessive time taken by proc 10 (9 seconds)
I greped the whole acs directory and found no procedure who echoes this "starting" alone
I've monitored the locks while the server hanging and found that some locks are places on some tables used by the web user.
And the normal return of the web server comes after those locks end.
So it is obvious somehow that these locks are the primary accused reason??
My other notice (and I dont know if it is related by anyhow) is that the OpenACS database user places a lock on BULK_MAIL_MESSAGES frequently. (and sometimes this lock stays for ever)
Can anybody light the thing up??
I just debugged a situation last night in one of the sites we host. A thread was crashing during a transaction, and due to some custom error code (at least I think that was the reason) execution was never returning to the original script, so the transaction was being left open. It didn't take long after this happened for other threads to start stacking up, needing to update a table that was being held hostage by that transaction, and very quickly everything ground to a screeching halt.
However, you have to keep Tom Kyte's pointers on when NOT to use them:
* You do not delete from the parent table.
* You do not update the parent table's unique/primary key value, either purposely or by accident(via a tool).
* You do not join from the parent table to the child table, or more generally - the foreign key columns do not support an important access path to the child table.
You say during the hang you observed "some locks on some tables". Well, what tables? And what code was taking those locks?
As others have suggested above, so far there is also little reason to believe that database locks have anything to do with your problem. They might, they might not. Verify that your AOLserver thread settings are correct. Use that AOLserver nstelemetry page to check for any suspicious AOLserver locks during the hang.
That 9 seconds taken by that scheduled proc seems suspicious. What proc is that, and what is it doing? Instrument all your scheduled procs if necessary in order to find out. Even it turns out to be un-related to your hanging problem, you probably should fix it anyway. Either speed up the scheduled proc, or give it its own thread to run in.
I Guess AOL Server and Linux themselves are not accused in this case, top shows a normal server behavior during the hang..
Heres the server threads settings any how:
ns_param maxconnections 20
ns_param maxdropped 0
ns_param maxthreads 10
ns_param minthreads 0
ns_param threadtimeout 120
This type of problem happens very rarely, and more clearly, when there is a big count of users (400+) using the web-end for Writing.
Scheduled Procedures run normally when the above condition disappears... Also, when users are purely reading from the web, Ive got no problem also.
My other OpenACS is accessing the other oracle application schemas. And during the hang, mostly, the locked tables are those of the other schemas.
A question out of this scope... Does anybody test Oracle 10G with OpenACS??
I am playing around to test the compatibility, 10G sounds great, but I am having some weird things also.
I dont exactly know why the new 10G enterprise manager shows NO locks list anymore...
Anyhow, I am a having a High CPU consumption by the nsd process after a while running AOL Server 4.01 Over 10G, Still cant figure why...
But Ill sure have more clues in while