Forum OpenACS Q&A: Database errors after power fails

Collapse
Posted by Bob OConnor on

After being up 101 days, our RH6.1 server running openacs 3.2.2 crashed and rebooted due to a local power failure.

Everything *Seems* to be running ok but this afternoon, 4 hours after the failure AND having left the ssh1 secure telnet window open and running "psql mydata" (pg7.01) AND the telnet window being about 900 miles from the server, I got these errors:

mydata=# ERROR:  Relation 'user_notifications' does not exist
ERROR:  No such attribute or function 'community_contact_p'
ERROR:  Cannot insert a duplicate key into unique index 
referer_log_pkey
ERROR:  Cannot insert a duplicate key into unique index 
referer_log_pkey
ERROR:  Cannot insert a duplicate key into unique index 
referer_log_pkey
ERROR:  No such attribute or function 'community_contact_p'
ERROR:  Relation 'user_notifications' does not exist
ERROR:  No such attribute or function 'community_contact_p'
ERROR:  No such attribute or function 'community_contact_p'
mydata=#

The previous command had run successfully and I haven't yet hunted down where the community_contact_p field is located.

I can continue to run commands from this psql session but I wonder if I have *subtle* corruption. For now, I'll run a backup which includes vacuum.

Has anyone seen this with psql? I would have expected to see these types of errors in the server.log and not psql.

TIA
-Bob

Collapse
Posted by Don Baccus on
For starters, all RDBMS systems are plagued with potential failure problems when there's a power outage depending on how your hardware is  configured.

A nasty "feature" of many modern disks is that they buffer writes, and  return "success" to the operating system when data makes it to the BUFFER rather than waiting until it makes it to the PLATTER.  The OS then has no way of knowing exactly when the data is truly safely stuffed onto the physical platter.

Then the OS tells the RDBMS that the data's successfully written and the RDBMS by necessity believes what it is told.

Of course, disk manufacturers are interested in maximizing performance  so generally ship disks from the factory set up in this mode.

This is insidious and fixing it requires that you check those little jumpers stuck into the back of each and every drive in your system, with each manufacturer having a different jumper scheme (which often changes within each manufacturer's disk line), etc.

Expensive disk arrays have enough battery backup to ensure that all data can physically be written before shutting down.  If you have a UPS hooked up to your box and if you've taught the system to shutdown the database when power fails and before the battery runs out, you'll also circumvent the problem.

Having passed along this truly depressing packet of information, I'll have to say that this doesn't look like the cause of your problem.  You're on the right track running pg_dump and following it with a vacuum.  The table that's missing is found in notifications.sql so it should be there - maybe you are seeing a subtle form of corruption, I hope not!

You should really upgrade to PG 7.1 BTW, it's considerably more robust  (mostly due to bug fixes).

Collapse
Posted by S. Y. on
I think some folks refer to what Don described as write-back caching. If you have one of the nice new Ultra160 drives, your disk buffer is four or eight megabytes. Even a consumer-grade ATA-100 drive has a two meg disk buffer. As an OpenACS developer, you should be able to realize how many typical bboard messages can fit in two megs.

Somewhere on the Adaptec web site, is a very terse FAQ that describes write-back caching. In that FAQ, they clearly warn that write-back caching should only be used on UPS-backed systems, lest the problems Bob described occur.

My Adaptec 29160 64-bit SCSI controller has three options for write-back caching: Yes, No, and N/C. The latter stands for "No Change", meaning that the SCSI controller will use whatever the drive has been set up for. Thus, if your drive manufacturer ships drives with drive-back caching turned on, then the SCSI controller will do it. And yes, the Adaptec SCSI controller defaults are set to "N/C".

Journalled filesystems don't help since the OS or database doesn't know if everything has committed to the disk platters or not. It's really a function of the disk drive hardware.

I'm sorry to have to parrot Don, but mission-critical data deserves a UPS. Period. And you have to be absolutely sure you can trust your client's data to that UPS. That means testing the computer/UPS connection. I use apcupsd and currently my system is supposed to go down with two minutes left on the batteries. I'm not running a database, and my system can shutdown properly in 20-30 seconds. You need to time your system shutdowns and make sure you have plenty of leeway for an orderly halt.

This is really a system administration issue, rather than a web/db developer issue. If you don't have a professional sysadmin on staff, you need to find someone to worry about this sort of stuff. Even good sysadmins make mistakes though, which is why you know banks, insurance companies, and hospitals still have UPSes on their computers.

Collapse
Posted by Rodger Donaldson on
Of course, sometimes UPSes don't help.  I've worked in a number of places where there have been outages because:

1/ The UPS fails when grid power dissapears (never tested regularly, probably never worked).

2/ Electricians get involved (nohing like a fumble fingered electrician to take down your power network).

3/ UPS faults take down the system even though grid power is present (which is why dual power supply systems are best set up with one UPS rail and one grid rail, rather than both on UPS).

4/ Dummies have plugged so much non-essential equipment into the UPS that there isn't enough power for a clean shutdown.

Battery backing your disk arrays and controller so it can last the few seconds needed to flush to disk in an emergency is another level of protection (NetApps do this, for example), but this is subject to the same potential issues as a UPS.

The only thing you can really do is bite the bullet and pay the penalty of diabling write caching to your drives (or at least the drives that DB writes happen on).  You'll just need to make up the speed with a good system design.