Forum OpenACS Development: aolserver/postgres hang
When I uninstall a package aolserver hangs on some drop table commands - which is in the middle of a db_transaction - so a postgres thread hangs. The tables that were in the transaction that have not been droped yet are locked (as expected). So to recover I have to kill aolserver and the postgres thread with the lock (waiting for drop table). As far as I can tell there is no good reason for this - I can manually drop the tables - so there are no dependencies.
I suspect that the problem is the nspostgres driver. I had serveral compiler warnings "warning: passing arg 3 of `Tcl_DStringAppend' as signed due to prototype" which I wasn't able to get rid of - nor was I able to find a satisfactory solution in any of the forums.
Has anyone seen this problem? Has anyone come up with a solution to the nspostgres compile problems.
I can't say I remember calling a drop table from within a transaction, but I just tried it with the same versions as you have above (on redhat 9), and at least for a test table with two rows, droping the table within a transaction worked fine.
I don't remember seeing those compiler warnings either - do you have the version of nspostgres as referred to by:
I had to get the system up this week so I went back to OpenACS 4.6.3 with aol 3.3oacs1 and the same version of Postgres (7.4.2). I had to fix some varbit stuff in postgres.sql in the kernel pkg, but other than that, everything works flawlessly on Suse 9.1.
My conclusion is that it is definitely not a Postgres 7.4.2 problem since it works with other versions and I can drop tables all day long in a transaction by hand. It is also unlikely that it is a Suse problem since everything works with older versions. I don't think that a db_transaction with a dml statement that drops a couple of tables is likely to be the problem either especially since you have it working on redhat and I ran the package manager to drop all the tables in a package and got the same hang as my code. So, I am almost certain it is the nspostgres driver – probably related to the complier warnings which I haven't had time to figure out yet.
I still would like to use 5.1 - but it might be a while until I have time to figure out the driver problem. Hopefully someone else has seen and solved this already.
select relname,pid,mode,granted from pg_locks inner join pg_class on pg_class.oid = relation;
And there seemed to be an exclusive lock held by one thread on apm_packages (probably something in the APM package uninstall procs), and another thread trying to acquire a lock (cannot remember what type) on that table, causing a deadlock. This was on a system upgraded from 4.6.3 to 5.0.0, using PG 7.3 (previously upgraded from 7.2), so it's fair to say it was a bit of a shot system.
To be honest, I couldn't ever be bothered to figure out how to resolve it, I just ran the drop scripts by hand. I think (and this is a fairly uneducated guess) that it could have something to do with how PG managed foreign key constraints in 7.2. I think it used to do it with a trigger on the table being referred (in this case apm_packages), and so when dropping the referring table you need to acquire a lock (possibly exclusive, think the PG manuals have something to say about that) on the referred table to delete the trigger. Hence the deadlock.
That sure sounds like the problem. When I was trying to figure out what was wrong (web server hung) I noticed that postgres would still show the tables in psql as being in the database but I couldn't do any operation on the tables like select etc. The select would just hang - almost certainly because of an exclusive lock/deadlock problem. As soon as I get caught up with some paying work that I have at the moment - I'll replicate the problem and check the locks. If your theory is correct I'll try to figure out a fix and post it as soon as I can.