Forum OpenACS Q&A: Database disappeared

Collapse
Posted by Chris Hardy on
This last Monday, I was making changed to a Postgresql server to allow a host to connect (pg_hba.conf).  When I restarted the server, and restarted the aolserver process, there was only one table left. All the ACS tables were gone, poof.

As I troubleshot the problem, I found that the last known good dump (I pg_dump every four hours) was a month ago, when I made a change to ttracker_tickets's table.

From then to this last Monday, openacs worked fine, but the dumps were missing all ACS related tables (I just found that out on Monday).

In the dump files, the schema for the one table remaining was wrong as well (only showing the column that I added during the alter table).

There are other apache/perl related apps on this host, and they didn't experience anything like this, so it leads me to believe this was a Aolserver/Postgresql problem.

Anyone have any ideas/experience with this?

Collapse
2: Re: Database disappeared (response to 1)
Posted by Jeff Davis on
I have never heard of anything like this happening nor do I think OpenACS or AOLServer could really cause something like this (in fact if you try and drop a package with openacs you will see that most of the time it fails).

I would look at the postgres server log around the time that the dumps stopped containing the data and see if you see anything unexpected (the log is in $PGDATA unless you have moved it).

You should also do a psql -l and see if the databases you expect are there.

You might also check and make sure that there are not two instances of postgres running (say 7.2 and 7.3) and that you are looking at the instance you really expect to be looking at.

Collapse
3: Re: Database disappeared (response to 1)
Posted by Chris Davies on
Do you use debian?  did you recently upgrade postgresql?

The maintainer's upgrade scripts rarely work for me and have resulted in data loss like this several times.

Collapse
4: Re: Database disappeared (response to 3)
Posted by Chris Hardy on
I don't use Debian, I use RH 9.  I found looking around that the commit logs (pg_clog) directory was huge (500M). Which makes me wonder if it was just plain DB corruption, or if, when the "ALTER TABLE" was run, something had locked that instance of the DB, and everything since then had been running off of the temporary tablespace.

How often will OpenACS do a SQL commit?

Collapse
5: Re: Database disappeared (response to 1)
Posted by Don Baccus on
You're guaranteed a commit after every HTTP request is satisfied.  And of course every DML statement that's not wrapped in an explicit transaction is itself a transaction with a subsequent commit.

So the answer is "a lot".