Forum OpenACS Development: aolserver/postgres hang

Collapse
Posted by Andrei Mitran on
I am using - aolserver 4.0 r2, openACS 5.1, postgres 7.4.2 on Suse 9.1.

When I uninstall a package aolserver hangs on some drop table commands - which is in the middle of a db_transaction - so a postgres thread hangs. The tables that were in the transaction that have not been droped yet are locked (as expected). So to recover I have to kill aolserver and the postgres thread with the lock (waiting for drop table). As far as I can tell there is no good reason for this - I can manually drop the tables  - so there are no dependencies.

I suspect that the problem is the nspostgres driver. I had serveral compiler warnings  "warning: passing arg 3 of `Tcl_DStringAppend' as signed due to prototype" which I wasn't able to get rid of - nor was I able to find a satisfactory solution in any of the forums.

Has anyone seen this problem? Has anyone come up with a solution to the nspostgres compile problems.

Thanks,

Andrei

Collapse
Posted by Mark Aufflick on
Hi Andrei,

I can't say I remember calling a drop table from within a transaction, but I just tried it with the same versions as you have above (on redhat 9), and at least for a test table with two rows, droping the table within a transaction worked fine.

I don't remember seeing those compiler warnings either - do you have the version of nspostgres as referred to by:

https://openacs.org/doc/openacs-5-1/aolserver4.html

?

Collapse
Posted by Andrei Mitran on
Yes, the nspostgres version that I have is correct.

I had to get the system up this week so I went back to OpenACS 4.6.3 with aol 3.3oacs1 and the same version of Postgres (7.4.2). I had to fix some varbit stuff in postgres.sql in the kernel pkg, but other than that, everything works flawlessly on Suse 9.1.

My conclusion is that it is definitely not a Postgres 7.4.2 problem since it works with other versions and I can drop tables all day long in a transaction by hand. It is also unlikely that it is a Suse problem since everything works with older versions. I don't think that a db_transaction with a dml statement that drops a couple of tables is likely to be the problem either especially since you have it working on redhat and I ran the package manager to drop all the tables in a package and got the same hang as my code. So, I am almost certain it is the nspostgres driver – probably related to the complier warnings which I haven't had time to figure out yet.

I still would like to use 5.1 - but it might be a while until I have time to figure out the driver problem. Hopefully someone else has seen and solved this already.

Andrei
.

Collapse
4: DDL does a commit (response to 1)
Posted by Andrew Piskorski on
At least in Oracle 8.x, doing any DDL whatsoever, including adding or dropping a table, immediately performs an implicit commit of the the current transaction. I assume Oracle 9.x and 10.x work the same way, but I've no idea what PostgreSQL does.
Collapse
Posted by Tom Ayles on
I seem to remember hitting a problem like this when trying to drop a table referencing apm_packages using the APM. I checked the locks in psql using:

select relname,pid,mode,granted from pg_locks inner join pg_class on pg_class.oid = relation;

And there seemed to be an exclusive lock held by one thread on apm_packages (probably something in the APM package uninstall procs), and another thread trying to acquire a lock (cannot remember what type) on that table, causing a deadlock. This was on a system upgraded from 4.6.3 to 5.0.0, using PG 7.3 (previously upgraded from 7.2), so it's fair to say it was a bit of a shot system.

To be honest, I couldn't ever be bothered to figure out how to resolve it, I just ran the drop scripts by hand. I think (and this is a fairly uneducated guess) that it could have something to do with how PG managed foreign key constraints in 7.2. I think it used to do it with a trigger on the table being referred (in this case apm_packages), and so when dropping the referring table you need to acquire a lock (possibly exclusive, think the PG manuals have something to say about that) on the referred table to delete the trigger. Hence the deadlock.

Collapse
Posted by Andrei Mitran on
Thanks,

That sure sounds like the problem. When I was trying to figure out what was wrong (web server hung) I noticed that postgres would still show the tables in psql as being in the database but I couldn't do any operation on the tables like select etc. The select would just hang - almost certainly because of an exclusive lock/deadlock problem. As soon as I get caught up with some paying work that I have at the moment - I'll replicate the problem and check the locks. If your theory is correct I'll try to figure out a fix and post it as soon as I can.

Andrei