Forum OpenACS Q&A: PostgreSQL dying
I am facing this problem for the first time... I have this OpenACS 3.2.2 installation on an AMD K6-2 300 with 64 Mb of RAM. It originally had only 32 Mb, so I was having PostgreSQL start with no '-B x' or '-o "-S y"' options as Don recommends because it would swap too much.
I asked and 32 more megs were put in the box. I then set PostgreSQL with '-B 500 -o "-S 500"' so it would have some more room to work, but now PostgreSQL keeps dying on me very frequently. Apparently it brings itself up and does its own cleaning, but AOLserver (3 running nsd76) goes defunct immediately without returning an error message or anything.
What could be causing this ? Bad RAM ? Wrong settings (-B, etc...) ? And what can I do to solve this (aD's keepalive ?)?
Here's what AOLserver leaves in the logs... Any help is appreciated.
[10/Jul/2000:18:12:07][5211.9220][-conn4-] Notice: Querying 'update clickthrough_log set click_count = click_count + 1 where local_url = 'index.adp' and foreign_url = 'http://www.bookstore.usu.edu' and trunc(entry_date) = trunc(sysdate());' NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query.
[10/Jul/2000:18:29:27][5675.4101][-conn0-] Notice: starting: waiting for connections [10/Jul/2000:18:29:37][5675.4101][-conn0-] Notice: Ns_PgExec: Trying to reopen database connection [10/Jul/2000:18:29:37][5675.4101][-conn0-] Notice: Opening exchange on localhost [10/Jul/2000:18:29:37][5675.4101][-conn0-] Error: Ns_PgOpenDb(postgres): Could not connect to localhost::exchange: The Data Base System is starting up [10/Jul/2000:18:29:37][5675.4101][-conn0-] Error: nsd.db: error(localhost::exchange, pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. ): update users set last_visit = sysdate() where user_id = 3
Interesting thing is that even with AOLserver down, going to psql and trying to delete a row from a table would cause the backend to die. So it had nothing to do with AOLserver.
So after wandering for a while, I dropdb'd and reloaded my database from scratch from the openacs datamodel... Voila ! No more instabilities... PostgreSQL has been working like a charm.
Could this mean that there's a problem with pg_dump ?
In reality, one normally would do a dropdb before a full restore just to clean out the various tables. I'm also curious as to whether or not Roberto got any errors during his reload (Roberto?).
Anyway, it sounds like one or more system tables got screwed up by the reload, and even if reloading fails it shouldn't cause backend crashes.
I then recompiled my PostgreSQL (I was intrigued and thought, why not ?), initdb'd, reloaded the database, and still unstable.
The only way that I solved this was reloading the DB from scratch from the datamodel and modifying the dump with just the tables I wanted (which was a pain in the butt).
The errors showed also that close to the end of the dump, PG says that it can't find some tables, although their definition is in the dump (e.g. survsimp_responses). Which means that it didn't create them, for what reason I don't know because they seem pretty normal to me. I think the non-creation of these tables were due to the fact that PG didn't create the function sysdate():
ERROR: Function 'sysdate()' does not exist Unable to identify a function that satisfies the given argument types You may need to add explicit typecasts (repeated 4 times) ERROR: Relation 'users_special_quotas' does not exist . ERROR: Relation 'survsimp_surveys' does not exist (repeated several times) ERROR: DefineIndex: relation "survsimp_surveys" not found ERROR: Relation 'users_special_quotas' does not exist ERROR: Relation 'survsimp_responses' does not exist ERROR: select rules target entry 16 has different size from attribute m_address ERROR: select rules target entry 4 has different size from attribute team_leader_p ERROR: select rules target entry 3 has different size from attribute on_which_table ERROR: select rules target entry 2 has different size from attribute salary ERROR: view rule for ec_subsubcategories_augmented must be named _RETec_subsubcategories_augmented ERROR: view rule for ec_gift_certificates_approved must be named _RETec_gift_certificates_approved ERROR: view rule for ec_gift_certificates_purchased must be named _RETec_gift_certificates_purchased ERROR: view rule for ec_fin_transactions_reportable must be named _RETec_fin_transactions_reportable ERROR: select rules target entry 7 has different size from attribute read_permission_p ERROR: select rules target entry 8 has different size from attribute email_bouncing_p ERROR: select rules target entry 9 has different size from attribute approved_p ERROR: view rule for user_group_member_field_mapping must be named _RETuser_group_member_field_mapping
I just forgot to add that the backend continues to crash with this last reload I made.
However, I can consistently reproduce backend crashes by simply pg_dumping and reloading the OpenACS datamodel and some data in three different machines running different versions of PostgreSQL 7.
I don't know about you all, but this scares the heck out of me. What other ways are there to backup a PG database?
If you can reproduce this problem as easily as you say (which I know you can, of course), the pg_hackers mailing list should hear about it.
Visit www.postgresql.org and chase links and you'll find out how to subscribe (it's a majordomo list, of course).
There are people there who do nothing but work on PG, and they'll be interested.
We did: on the to-be-backed-up machine: 1. pg_dump -o HRS > hrs.pgdump on the restore machine: 2. dropdb HRS 3. createdb HRS 4. psql -e < hrs.pgdump The echoed commands did not show errors as when using cat hrs.pdump |psql HRS (which had errors) 5. Then we restarted nsd Q: What is the proper way to backup and restore an entire openACS site thanks.