Forum OpenACS Development: Deadlocks

Collapse
Posted by Malte Sussdorff on
In this thread I will post the deadlocks I encountered on 5.3 on a fairly busy site which I hope to get fixed using the method Gustav and Rocael suggested earlier (https://openacs.org/forums/message-view?message_id=461759). If you have an idea what could be done about the deadlocks apart from locking acs_objects in share mode, I would be happy 😊.

Do we need to put the share lock everywhere where we have an insert into acs_objects ?

=======

Error: Ns_PgExec: result status: 7 message: ERROR: deadlock detected
DETAIL: Process 21202 waits for ShareLock on transaction 297574; blocked by process 21163.
Process 21163 waits for ShareRowExclusiveLock on relation 17591 of database 17233; blocked by process 21202.
CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."acs_objects" x WHERE "object_id" = $1 FOR UPDATE OF x"
SQL statement "insert into acs_object_context_index (object_id, ancestor_id, n_generations) select $1 as object_id, ancestor_id, n_generations + 1 as n_generations from acs_object_context_index where object_id = $2 "
PL/pgSQL function "acs_objects_context_id_in_tr" line 10 at SQL statement
SQL statement "insert into acs_objects (object_id, object_type, title, package_id, context_id, creation_date, creation_user, creation_ip, security_inherit_p) values ( $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 , $9 )"
PL/pgSQL function "acs_object__new" line 40 at SQL statement
PL/pgSQL function "acs_rel__new" line 13 at assignment
PL/pgSQL function "membership_rel__new" line 11 at assignment

========

Error: Ns_PgExec: result status: 7 message: ERROR: deadlock detected
DETAIL: Process 21204 waits for ShareLock on transaction 145694; blocked by process 21163.
Process 21163 waits for ShareLock on transaction 145856; blocked by process 21204.
CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."acs_objects" x WHERE "object_id" = $1 FOR UPDATE OF x"
SQL statement "insert into acs_object_context_index (object_id, ancestor_id, n_generations) select $1 as object_id, ancestor_id, n_generations + 1 as n_generations from acs_object_context_index where object_id = $2 "
PL/pgSQL function "acs_objects_context_id_in_tr" line 10 at SQL statement
SQL statement "insert into acs_objects (object_id, object_type, title, package_id, context_id, creation_date, creation_user, creation_ip, security_inherit_p) values ( $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 , $9 )"
PL/pgSQL function "acs_object__new" line 40 at SQL statement
PL/pgSQL function "content_revision__new" line 30 at assignment
PL/pgSQL function "content_revision__new" line 14 at return
PL/pgSQL function "contact_party_revision__new" line 18 at assignment

============

Error: Ns_PgExec: result status: 7 message: ERROR: deadlock detected
DETAIL: Process 21203 waits for ShareLock on transaction 279773; blocked by process 21163.
Process 21163 waits for ShareRowExclusiveLock on relation 17591 of database 17233; blocked by process 21203.
CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."acs_objects" x WHERE "object_id" = $1 FOR UPDATE OF x"
SQL statement "insert into acs_object_context_index (object_id, ancestor_id, n_generations) select $1 as object_id, ancestor_id, n_generations + 1 as n_generations from acs_object_context_index where object_id = $2 "
PL/pgSQL function "acs_objects_context_id_in_tr" line 10 at SQL statement
SQL statement "insert into acs_objects (object_id, object_type, title, package_id, context_id, creation_date, creation_user, creation_ip, security_inherit_p) values ( $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 , $9 )"
PL/pgSQL function "acs_object__new" line 40 at SQL statement
PL/pgSQL function "content_item__new" line 85 at assignment
PL/pgSQL function "content_folder__new" line 34 at assignment

Collapse
2: Re: Deadlocks (response to 1)
Posted by Gustaf Neumann on
oh no, not again. Which application caused this?

Your examples show deadlocks on

  • insert into acs_object_context_index
triggered from
  • acs_rel__new, membership_rel__new, and
  • contact_party_revision__new, content_revision__new, and
  • content_folder__new, content_item__new
For performance reasons, we do not want to lock the most inner part (acs_objects, acs_object_context_index), but the higher level operations. We have locks already for content_item_new and content_revsion_new, but only on the tcl level. In your cases 2 and 3, the stored procedures are called directly from other stored procedures.

Please, do the following: add
LOCK TABLE acs_objects IN SHARE ROW EXCLUSIVE MODE
to the stored procedures content_revision__new, content_item__new and acs_rel__new, and delete the locks from tcl. This should not hurt performance to much and your problem should go away.

Collapse
3: Re: Deadlocks (response to 2)
Posted by Malte Sussdorff on
I solved the problem with content_folder__new by making content::folder::new call content::item::new directly.

I solved content_party_revision__new by calling content::revision::new within contact::party::revision.

I added the lock to acs_rel__new and commit on HEAD.

Collapse
4: Re: Deadlocks (response to 1)
Posted by Malte Sussdorff on
GRRR......

Apparently even the content::item::new deadlock does not work fine. Should we get it directly into the content_item__new plsql function instead of calling it separately in TCL ?

==================

ERROR: deadlock detected
DETAIL: Process 1365 waits for ShareRowExclusiveLock on relation 17591 of database 17233; blocked by process 1363.
Process 1363 waits for ShareLock on transaction 1028622; blocked by process 1365.

SQL: LOCK TABLE acs_objects IN SHARE ROW EXCLUSIVE MODE
while executing
"ns_pg_bind dml nsdb0 {LOCK TABLE acs_objects IN SHARE ROW EXCLUSIVE MODE}"
("uplevel" body line 1)
invoked from within
"uplevel $ulevel [list ns_pg_bind $type $db $sql]"
("postgresql" arm line 2)
invoked from within
"switch $driverkey {
oracle {
return [uplevel $ulevel [list ns_ora $type $db $sql] $args]
}
..."
invoked from within
"db_exec dml $db $full_statement_name $sql"
("uplevel" body line 2)
invoked from within
"uplevel 1 $code_block "
invoked from within
"db_with_handle -dbn $dbn db {
db_exec dml $db $full_statement_name $sql
}"
(procedure "db_dml" line 100)
invoked from within
"db_dml lock_objects "LOCK TABLE acs_objects IN SHARE ROW EXCLUSIVE MODE""
("uplevel" body line 2)
invoked from within
"uplevel 1 $transaction_code "
(procedure "db_transaction" line 39)
invoked from within
"db_transaction {
db_dml lock_objects "LOCK TABLE acs_objects IN SHARE ROW EXCLUSIVE MODE"
set item_id [package_exec_plsql -var_list $var_list cont..."

Collapse
5: Re: Deadlocks (response to 4)
Posted by Malte Sussdorff on
Moving it into the plsql would probably help in this case as well:

16/Nov/2006:14:16:02
Error: Ns_PgExec: result status: 7 message: ERROR: deadlock detected
DETAIL: Process 1365 waits for ShareLock on transaction 1032767; blocked by process 1364.
Process 1364 waits for ShareRowExclusiveLock on relation 17591 of database 17233; blocked by process 1365.
CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."acs_objects" x WHERE "object_id" = $1 FOR UPDATE OF x"
SQL statement "insert into acs_object_context_index (object_id, ancestor_id, n_generations) select $1 as object_id, ancestor_id, n_generations + 1 as n_generations from acs_object_context_index where object_id = $2 "
PL/pgSQL function "acs_objects_context_id_in_tr" line 10 at SQL statement
SQL statement "insert into acs_objects (object_id, object_type, title, package_id, context_id, creation_date, creation_user, creation_ip, security_inherit_p) values ( $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 , $9 )"
PL/pgSQL function "acs_object__new" line 40 at SQL statement
PL/pgSQL function "content_item__new" line 77 at assignment

====

Yes, this is all of the error 😊

Collapse
7: Re: Deadlocks (response to 4)
Posted by Gustaf Neumann on
Should we get it directly into the content_item__new plsql
function instead of calling it separately in TCL ?

In case it was not clear, my suggestion was to add the lock statement to the three mentioned stored procedures.

Collapse
8: Re: Deadlocks (response to 7)
Posted by Malte Sussdorff on
Gustaf, I totally understand what you suggested, these were two *new* deadlocks and actually caused by calling content::item::new and content::revision::new which has a separate db_dml for the locking.

I will but it directly into content_item__new and content_revision__new then.

We are not going to test 8.2beta3 as it is still beta and we are running a production system. On the development sites it never happens.

What are you using for stress tests ?

Collapse
9: Re: Deadlocks (response to 8)
Posted by Gustaf Neumann on
Having the locks in the tcl-layer does not necessarily help, when one stored procedures calls other stored procedures and bypasses the lock. This can certainly lead to the lock-conflict you are seeing.

I would expect that when you add the lock statement into the three (not two) mentioned stored procedures, the problem will go away (the lock statements can the be removed from the tcl code).

I can be certainly wrong with that assumption, which is purely hypothetical, since i have not tried to reproduce the error you are seeing (i have just a script for stress-testing xowiki).

Collapse
6: Re: Deadlocks (response to 1)
Posted by Rocael Hernández Rizzardini on
have you tried 8.2?
will be good to have the stress tests cases available so we can see if it is reproduced again in future version of PG.