Forum OpenACS Q&A: Oracle problem - any ideas?
in case this has happened to someone else:
This system crashed while I was importing a large table (lots of
LOBs). When I start up I get a message that says that system.dbf is
more recent than the control file.
First I tried starting up with each control file in turn - no luck
(though now it won't start up at all if I switch away from the control
file I'm currently using). So I moved on to "recover database using
It needs sequence #16163. I tried telling it to read from redo01.log,
which was the active redo log at the time of the crash, but it tells
me that log contains sequence 16162. So then I tried redo02.log, which
it was unable to read (internal error code, arguments: ,
, , , , , , ) and redo03.log, which
of course was too old (sequence 16161).
It looks to me like the needed sequence wasn't written, due to the
crash. Maybe it was about to switch to redo log 2, and crashed during
the switch? Whatever, I just need to get this database up and running!
Is there anything I can do here, short of reinstalling and loading
last night's dump?
It is times like this when I realize that I don't know as much about
Oracle as I need to! :)
(I'm leaving out a *lot* of steps I went through to figure this out!):
Edit init.ora and try each control file in turn until you find one that Oracle likes (it will spit more errors about your control file being out of sync if it doesn't like the one you chose).
recover database using backup controlfile<br>
Add these two lines to your init.ora file:<br>
Presto! Now I'm doing a full export, and will recreate the database and import the data.
Even after getting a successful startup, which should have reconciled everything, it will not start up without fast_start_parallel_rollback set to false.
At this point, only the control file you've been working with is valid; the others are hopelessly out of date. I'm not entirely sure what the recovery process is for that, but it doesn't matter since everything's getting blown away shortly.
Time to read up on archive logs. I have a sneaking suspicion that doing nightly dumps is really not all that's required here! :)
All good stuff, and we do need to improve in this area. However, as far as I know it wouldn't have helped with yesterday's problem.
What happened was that the system dropped like a rock while I was doing an import. The initial problem was that the control file was newer (had a higher SCN) than the system dbf file. The solution to this problem is to either move forward using log files, or roll back to a point in time where all data is available.
I had all the pieces I needed. The problem was that the very last change wasn't written to the redo log, because of the unexpectedness of the system going down. I should have been able to get Oracle to cleanly roll back to a previous point in time, but for unknown reasons it just wouldn't do it. So I finally resorted to that undocumented flag, which overlooks the fact that the redo logs are junky and reconciles things as best it can.
With the backup strategies you're talking about, all of which are better than nightly dumps, I could have restored from backups without losing a day's worth of entries, and that's a good thing. I've been meaning to look into better strategies and haven't had time, so I'll definitely look at your notes. But at the moment I'm more interested in understanding recovery strategies. As I understand it, even if I had archive logs I would have had the same chance of them being out of sync, and I would have been in the same situation I ended up in anyway.
In any case, I got past that problem only to create a new one. I installed the 184.108.40.206 patch on this system, which is apparently incompatible with RH 7.2. Completely hosed the Oracle installation. :( At first I wasn't sure if the problems I was having were due to database problems (Oracle did warn about subtle corruption after using that flag) or what, so I did a fresh Oracle install, added the patch, and boom, Oracle no workie once again. So now I'm moving the site to a temporary location so we can upgrade that system to 7.3, which I know does work with the patch.
This is *not* how I wanted to spend this weekend! :)
redo log properly during an import, well, that sounds like exactly the
sort of thing using Oracle is supposed to avoid, doesn't it? But if
Oracle crashed, then maybe all bets are off - not good. I don't
remember enough about how the control file is supposed to interact
with the redo logs to guess what's really going on there, though.
Definitely sounds like one to open an Oracle TAR on. Please let us
know if you get any useful info out of them! (And whatever else you
figure out yourself, of course.)
I won't be opening a TAR on this. In order to do that I'd have to leave the system untouched, and I've already done as much recovery as one can do so the evidence has been spoiled. And today we'll be reinstalling the system with RedHat 7.3 so I can install Oracle with all the proper patches, so then the evidence will be gone. This is the most powerful system we have, and the site's currently on a lesser system and really chugging. I can't wait around on this.
I did post about this to the forum on Metalink, so if someone from Oracle is interested in knowing that it happened, the record is there for them to see. There were plenty of people with this problem posting there, but most of them had come about it differently. They'd experienced a full system failure, had to restore from cold backups, and found that their backup was hosed because Oracle wasn't really shut down when they took the backup like they thought it was.
On the whole, I'm actually pretty happy with the nightly-dump backup strategy. There are plusses and minuses to cold and hot backups; with the dump method there's a high likelyhood of having to reinstall Oracle to recover from a catastrophic failure, which the cold backup method avoids. But IMHO the likelyhood of either AOLserver or Oracle failing to start back up some night is greater than the likelyhood of having a catastrophic failure (I think this is the first one I've had in 3.5 years). I think if I add archive logging, which saves me from losing the day's changes, we'll be in good shape.
After that I set up everything for archive mode and also a seperate server to have a backup of the archives. I will try to search for the notes, but I may be stuck typing them back in.
In any case, I also learned, if you value your data you better run in archive mode ( and even that has its pitfalls, although not being a full time dba I may have just done it wrong).