Forum OpenACS Q&A: OpenACS and Database Locks

1: OpenACS and Database Locks - Conflict?

Posted by Samer Abukhait on 03/23/04 09:53 AM

Hello Guys,

I have this weird something and couldn't reach to a reasonable explanation.

First of all, I run OpenACS 5 (W .LRN) Over RH AS2.1/Oracle9.2.0.4
I have another back-end of Oracle Applications connected to the Same Web Database (A fertile environment for lock conflict)

Anyhow, Sometimes the Web Server hangs vaguely (for a while only) then it returns normal.

I suspected that a lock somehow (from the other back-end) is preventing the server from doing something... (Bu, why the whole server!!)

Here are some notices:

The server log when in hang status

[23/Mar/2004:07:51:35][16469.1228820][-sched:idle3-] Notice: starting
[23/Mar/2004:07:51:36][16469.1245205][-sched:idle4-] Notice: starting
[23/Mar/2004:07:51:37][16469.1261590][-sched:idle5-] Notice: starting
[23/Mar/2004:07:52:50][16469.1277975][-sched:idle6-] Notice: starting
[23/Mar/2004:08:27:50][16469.32771][-sched-] Warning: sched: excessive time taken by proc 6 (9 seconds)
[23/Mar/2004:08:28:01][16469.32771][-sched-] Warning: sched: excessive time taken by proc 10 (9 seconds)

I greped the whole acs directory and found no procedure who echoes this "starting" alone

I've monitored the locks while the server hanging and found that some locks are places on some tables used by the web user.
And the normal return of the web server comes after those locks end.
So it is obvious somehow that these locks are the primary accused reason??

My other notice (and I dont know if it is related by anyhow) is that the OpenACS database user places a lock on BULK_MAIL_MESSAGES frequently. (and sometimes this lock stays for ever)

Can anybody light the thing up??

2: Re: OpenACS and Database Locks - Conflict? (response to 1)

Posted by Brian Fenton on 03/23/04 05:48 PM

If the whole webserver is hanging it sounds like AOLserver is reaching its maximum number of threads. What are your maxthreads, minthreads and maxconnections settings? I think each scheduled proc will take its own thread.

Brian

3: Re: OpenACS and Database Locks - Conflict? (response to 1)

Posted by Janine Ohmer on 03/23/04 05:57 PM

I don't have much to offer as far as why your site is hanging, but I can confirm that one locked thread can cause the whole site to hang.

I just debugged a situation last night in one of the sites we host. A thread was crashing during a transaction, and due to some custom error code (at least I think that was the reason) execution was never returning to the original script, so the transaction was being left open. It didn't take long after this happened for other threads to start stacking up, needing to update a table that was being held hostage by that transaction, and very quickly everything ground to a screeching halt.

4: Re: OpenACS and Database Locks - Conflict? (response to 1)

Posted by Andrew Grumet on 03/24/04 12:57 PM

You might find the "nstelemetry.adp" page useful, if you haven't tried it already.

http://cvs.sourceforge.net/viewcvs.py/aolserver/nstelemetry/

5: Re: OpenACS and Database Locks - Conflict? (response to 1)

Posted by Vamsee Kanakala on 03/26/04 06:22 AM

I don't know if this helps much, but one common cause of database locks is unindexed foreign keys. If you are deleting from the parent table or the referenced field is getting updated frequently, the whole child table gets locked each and every time. So, it's a good idea to have indexes on foreign keys.

However, you have to keep Tom Kyte's pointers on when NOT to use them:

* You do not delete from the parent table.
* You do not update the parent table's unique/primary key value, either purposely or by accident(via a tool).
* You do not join from the parent table to the child table, or more generally - the foreign key columns do not support an important access path to the child table.

Regards,
Vamsee.

6: Re: OpenACS and Database Locks - Conflict? (response to 1)

Posted by Andrew Piskorski on 04/04/04 03:06 PM

Samer, is your non-OpenACS "another back-end of Oracle Applications" accessing your OpenACS Oracle schema in any way? If it is not then AFAIK there is no possible way that it's presence in your Oracle instance could be causing transaction locks. It might make your Oracle instance slower, yes, and might consume other Oracle-internal resources ("latches", etc.) which might conceivable cause the symptoms you observe, but it certainly can't cause table or row locking in your OpenACS schema unless it actually reads from that schema in some way. At any rate that's not the place to start looking for the problem.

You say during the hang you observed "some locks on some tables". Well, what tables? And what code was taking those locks?

As others have suggested above, so far there is also little reason to believe that database locks have anything to do with your problem. They might, they might not. Verify that your AOLserver thread settings are correct. Use that AOLserver nstelemetry page to check for any suspicious AOLserver locks during the hang.

That 9 seconds taken by that scheduled proc seems suspicious. What proc is that, and what is it doing? Instrument all your scheduled procs if necessary in order to find out. Even it turns out to be un-related to your hanging problem, you probably should fix it anyway. Either speed up the scheduled proc, or give it its own thread to run in.

7: Re: OpenACS and Database Locks - Conflict? (response to 1)

Posted by Samer Abukhait on 04/04/04 04:56 PM

Well...

I Guess AOL Server and Linux themselves are not accused in this case, top shows a normal server behavior during the hang..

Heres the server threads settings any how:
ns_param maxconnections 20
ns_param maxdropped 0
ns_param maxthreads 10
ns_param minthreads 0
ns_param threadtimeout 120

This type of problem happens very rarely, and more clearly, when there is a big count of users (400+) using the web-end for Writing.

Scheduled Procedures run normally when the above condition disappears... Also, when users are purely reading from the web, Ive got no problem also.

My other OpenACS is accessing the other oracle application schemas. And during the hang, mostly, the locked tables are those of the other schemas.

A question out of this scope... Does anybody test Oracle 10G with OpenACS??

I am playing around to test the compatibility, 10G sounds great, but I am having some weird things also.
I dont exactly know why the new 10G enterprise manager shows NO locks list anymore...
Anyhow, I am a having a High CPU consumption by the nsd process after a while running AOL Server 4.01 Over 10G, Still cant figure why...

But Ill sure have more clues in while

8: Your thread settings are wrong (response to 7)

Posted by Andrew Piskorski on 04/05/04 06:00 AM

Your AOLserver thread settings are wrong. To start with try setting minthreads to 10 (same as your maxthreads) and threadtimeout to 60*60 or so. If you do that your problem will go away. See this thread for more info.

Forum OpenACS Q&A: OpenACS and Database Locks - Conflict?