Forum OpenACS Q&A: OpenACS and Database Locks - Conflict?

Hello Guys,

I have this weird something and couldn't reach to a reasonable explanation.

First of all, I run OpenACS 5 (W .LRN) Over RH AS2.1/Oracle9.2.0.4
I have another back-end of Oracle Applications connected to the Same Web Database (A fertile environment for lock conflict)

Anyhow, Sometimes the Web Server hangs vaguely (for a while only) then it returns normal.

I suspected that a lock somehow (from the other back-end) is preventing the server from doing something... (Bu, why the whole server!!)

Here are some notices:

The server log when in hang status

[23/Mar/2004:07:51:35][16469.1228820][-sched:idle3-] Notice: starting
[23/Mar/2004:07:51:36][16469.1245205][-sched:idle4-] Notice: starting
[23/Mar/2004:07:51:37][16469.1261590][-sched:idle5-] Notice: starting
[23/Mar/2004:07:52:50][16469.1277975][-sched:idle6-] Notice: starting
[23/Mar/2004:08:27:50][16469.32771][-sched-] Warning: sched: excessive time taken by proc 6 (9 seconds)
[23/Mar/2004:08:28:01][16469.32771][-sched-] Warning: sched: excessive time taken by proc 10 (9 seconds)

I greped the whole acs directory and found no procedure who echoes this "starting" alone

I've monitored the locks while the server hanging and found that some locks are places on some tables used by the web user.
And the normal return of the web server comes after those locks end.
So it is obvious somehow that these locks are the primary accused reason??

My other notice (and I don’t know if it is related by anyhow) is that the OpenACS database user places a lock on BULK_MAIL_MESSAGES frequently. (and sometimes this lock stays for ever)

Can anybody light the thing up??

Collapse
Posted by Brian Fenton on
If the whole webserver is hanging it sounds like AOLserver is reaching its maximum number of threads. What are your maxthreads, minthreads and maxconnections settings? I think each scheduled proc will take its own thread.

Brian

Collapse
Posted by Janine Ohmer on
I don't have much to offer as far as why your site is hanging, but I can confirm that one locked thread can cause the whole site to hang.

I just debugged a situation last night in one of the sites we host.  A thread was crashing during a transaction, and due to some custom error code (at least I think that was the reason) execution was never returning to the original script, so the transaction was being left open.  It didn't take long after this happened for other threads to start stacking up, needing to update a table that was being held hostage by that transaction, and very quickly everything ground to a screeching halt.

Collapse
Posted by Andrew Grumet on
You might find the "nstelemetry.adp" page useful, if you haven't tried it already.

http://cvs.sourceforge.net/viewcvs.py/aolserver/nstelemetry/

Collapse
Posted by Vamsee Kanakala on
I don't know if this helps much, but one common cause of database locks is unindexed foreign keys. If you are deleting from the parent table or the referenced field is getting updated frequently, the whole child table gets locked each and every time. So, it's a good idea to have indexes on foreign keys.

However, you have to keep Tom Kyte's pointers on when NOT to use them:

* You do not delete from the parent table.
* You do not update the parent table's unique/primary key value, either purposely or by accident(via a tool).
* You do not join from the parent table to the child table, or more generally - the foreign key columns do not support an important access path to the child table.

Regards,
Vamsee.

Collapse
Posted by Andrew Piskorski on
Samer, is your non-OpenACS "another back-end of Oracle Applications" accessing your OpenACS Oracle schema in any way? If it is not then AFAIK there is no possible way that it's presence in your Oracle instance could be causing transaction locks. It might make your Oracle instance slower, yes, and might consume other Oracle-internal resources ("latches", etc.) which might conceivable cause the symptoms you observe, but it certainly can't cause table or row locking in your OpenACS schema unless it actually reads from that schema in some way. At any rate that's not the place to start looking for the problem.

You say during the hang you observed "some locks on some tables". Well, what tables? And what code was taking those locks?

As others have suggested above, so far there is also little reason to believe that database locks have anything to do with your problem. They might, they might not. Verify that your AOLserver thread settings are correct. Use that AOLserver nstelemetry page to check for any suspicious AOLserver locks during the hang.

That 9 seconds taken by that scheduled proc seems suspicious. What proc is that, and what is it doing? Instrument all your scheduled procs if necessary in order to find out. Even it turns out to be un-related to your hanging problem, you probably should fix it anyway. Either speed up the scheduled proc, or give it its own thread to run in.

Collapse
Posted by Samer Abukhait on
Well...

I Guess AOL Server and Linux themselves are not accused in this case, top shows a normal server behavior during the hang..

Here’s the server threads settings any how:
        ns_param        maxconnections          20
        ns_param        maxdropped              0
        ns_param        maxthreads              10
        ns_param        minthreads              0
        ns_param        threadtimeout          120

This type of problem happens very rarely, and more clearly, when there is a big count of users (400+) using the web-end for “Writing”.

Scheduled Procedures run normally when the above condition disappears... Also, when users are purely reading from the web, I’ve got no problem also.

My other OpenACS is accessing the other oracle application schemas. And during the hang, mostly, the locked tables are those of the other schemas.

A question out of this scope... Does anybody test Oracle 10G with OpenACS??

I am playing around to test the compatibility, 10G sounds great, but I am having some weird things also.
I don’t exactly know why the new 10G enterprise manager shows NO locks list anymore...
Anyhow, I am a having a High CPU consumption by the nsd process after a while running AOL Server 4.01 Over 10G, Still can’t figure why...

But I’ll sure have more clues in while

Collapse
Posted by Andrew Piskorski on
Your AOLserver thread settings are wrong. To start with try setting minthreads to 10 (same as your maxthreads) and threadtimeout to 60*60 or so. If you do that your problem will go away. See this thread for more info.