Forum OpenACS Development: Re: Deadlocks

Collapse
4: Re: Deadlocks (response to 1)
Posted by Malte Sussdorff on
GRRR......

Apparently even the content::item::new deadlock does not work fine. Should we get it directly into the content_item__new plsql function instead of calling it separately in TCL ?

==================

ERROR: deadlock detected
DETAIL: Process 1365 waits for ShareRowExclusiveLock on relation 17591 of database 17233; blocked by process 1363.
Process 1363 waits for ShareLock on transaction 1028622; blocked by process 1365.

SQL: LOCK TABLE acs_objects IN SHARE ROW EXCLUSIVE MODE
while executing
"ns_pg_bind dml nsdb0 {LOCK TABLE acs_objects IN SHARE ROW EXCLUSIVE MODE}"
("uplevel" body line 1)
invoked from within
"uplevel $ulevel [list ns_pg_bind $type $db $sql]"
("postgresql" arm line 2)
invoked from within
"switch $driverkey {
oracle {
return [uplevel $ulevel [list ns_ora $type $db $sql] $args]
}
..."
invoked from within
"db_exec dml $db $full_statement_name $sql"
("uplevel" body line 2)
invoked from within
"uplevel 1 $code_block "
invoked from within
"db_with_handle -dbn $dbn db {
db_exec dml $db $full_statement_name $sql
}"
(procedure "db_dml" line 100)
invoked from within
"db_dml lock_objects "LOCK TABLE acs_objects IN SHARE ROW EXCLUSIVE MODE""
("uplevel" body line 2)
invoked from within
"uplevel 1 $transaction_code "
(procedure "db_transaction" line 39)
invoked from within
"db_transaction {
db_dml lock_objects "LOCK TABLE acs_objects IN SHARE ROW EXCLUSIVE MODE"
set item_id [package_exec_plsql -var_list $var_list cont..."

Collapse
5: Re: Deadlocks (response to 4)
Posted by Malte Sussdorff on
Moving it into the plsql would probably help in this case as well:

16/Nov/2006:14:16:02
Error: Ns_PgExec: result status: 7 message: ERROR: deadlock detected
DETAIL: Process 1365 waits for ShareLock on transaction 1032767; blocked by process 1364.
Process 1364 waits for ShareRowExclusiveLock on relation 17591 of database 17233; blocked by process 1365.
CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."acs_objects" x WHERE "object_id" = $1 FOR UPDATE OF x"
SQL statement "insert into acs_object_context_index (object_id, ancestor_id, n_generations) select $1 as object_id, ancestor_id, n_generations + 1 as n_generations from acs_object_context_index where object_id = $2 "
PL/pgSQL function "acs_objects_context_id_in_tr" line 10 at SQL statement
SQL statement "insert into acs_objects (object_id, object_type, title, package_id, context_id, creation_date, creation_user, creation_ip, security_inherit_p) values ( $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 , $9 )"
PL/pgSQL function "acs_object__new" line 40 at SQL statement
PL/pgSQL function "content_item__new" line 77 at assignment

====

Yes, this is all of the error 😊

Collapse
7: Re: Deadlocks (response to 4)
Posted by Gustaf Neumann on
Should we get it directly into the content_item__new plsql
function instead of calling it separately in TCL ?

In case it was not clear, my suggestion was to add the lock statement to the three mentioned stored procedures.

Collapse
8: Re: Deadlocks (response to 7)
Posted by Malte Sussdorff on
Gustaf, I totally understand what you suggested, these were two *new* deadlocks and actually caused by calling content::item::new and content::revision::new which has a separate db_dml for the locking.

I will but it directly into content_item__new and content_revision__new then.

We are not going to test 8.2beta3 as it is still beta and we are running a production system. On the development sites it never happens.

What are you using for stress tests ?

Collapse
9: Re: Deadlocks (response to 8)
Posted by Gustaf Neumann on
Having the locks in the tcl-layer does not necessarily help, when one stored procedures calls other stored procedures and bypasses the lock. This can certainly lead to the lock-conflict you are seeing.

I would expect that when you add the lock statement into the three (not two) mentioned stored procedures, the problem will go away (the lock statements can the be removed from the tcl code).

I can be certainly wrong with that assumption, which is purely hypothetical, since i have not tried to reproduce the error you are seeing (i have just a script for stress-testing xowiki).