Forum OpenACS Q&A: Postgres database recovery

Collapse
Posted by Angel Hernandez on
Is there a way to "clean" a corrupted PG database?

I had to shut down my box for a few minutes tonight, and when shutting down, I forgot to cleanly kill Postmaster. After restarting and attempting to start Postmaster again, I received the following message.

DEBUG:  Data Base System is starting up at Sun Jul 23 21:30:37 2000
DEBUG:  Data Base System was interrupted being in recovery at Sun Jul
23 21:27:54 2000
	This propably means that some data blocks are corrupted
	And you will have to use last backup for recovery
DEBUG:  CheckPoint record at (0, 1016)
DEBUG:  Redo record at (0, 1016); Undo record at (0, 0)
DEBUG:  NextTransactionId: 164479; NextOid: 0
DEBUG:  Invalid NextTransactionId/NextOid
DEBUG:  The DataBase system was not properly shut down
	Automatic recovery is in progress...
DEBUG:  Redo starts at (0, 1064)
Startup failed - abort

Want even worse news? I don't have a backup of the db - DOH! Am I out of luck, or is there a way I can recover my corrupted db?

Collapse
Posted by Don Baccus on
I suggest you take this question over to the appropriate postgres mailing list and ask for help.  You may be out of luck.  Bullet-proofing of the database against this kind of unexpected shutdown is a major goal of PG 7.1, which will implement a write-ahead logging system (in essence, you get two copies of the database at all times, last backup + write-ahead log == current database + transaction  log).

I'm surprised you did so much harm by shutting down the system, you weren't running with "-F" (no fsynch), were you?  In another thread I pointed out that Linux can only guarantee that blocks make it to the disk drive, which in practice means the drive's cache, and platter writing can lag, causing problems in the case of a crash.  If the system was busy when you shut it down, something like this might have caused the corruption you're seeing.

Normally, unexpected shutdown just causes the loss of uncompleted transactions, not corruption like this.

You can help avoid problems by:

1. Using the postgres script in /etc/rc.d/init.d to start and stop the  postmaster (assuming you're running RH, the scripts move around according to distro).  System startup and shutdown also uses this script, gently killing the postmaster so it shuts down cleanly when you stop the system.  Then you don't have to worry about remembering to do so manually.

2. Doing backups.  The documents have a script I've written for my own  production site that you can modify for your own site.