Forum OpenACS Q&A: Rescuing a corrupted db ...

Collapse
Posted by Ola Hansson on
Hi,

Yesterday my server suffered a harddrive failure and corruption of my PG 7.2.4 database in one single sweep ... Unfortunately the drive that went down was containing the nightly dumps for my backgammon site and for reasons beyond comprehension those were the only dumps I'd kept. Doh!

Here is how my revival attempts go, and if anyone has any suggestions what I can do, I would be eternally grateful to them:


postgres72@hal:~$ /usr/local/pgsql-7.2/bin/pg_ctl -D /usr/local/pgsql-7.2/data/ start
postmaster successfully started
postgres72@hal:~$ DEBUG:  database system was shut down at 2004-08-19 09:05:35 CEST
DEBUG:  open of /usr/local/pgsql-7.2/data//pg_xlog/0000000000000006 (log file 0, segment 6) failed: No such file or directory
DEBUG:  invalid primary checkpoint record
DEBUG:  open of /usr/local/pgsql-7.2/data//pg_xlog/0000000000000006 (log file 0, segment 6) failed: No such file or directory
DEBUG:  invalid secondary checkpoint record
FATAL 2:  unable to locate a valid checkpoint record
DEBUG:  startup process (pid 11521) exited with exit code 2
DEBUG:  aborting startup due to startup process failure

postgres72@hal:~$

I compiled pg_resetxlog from contrib because Tom Lane from the PG group had suggested one fellow, who had had a similar error, to run it.

postgres72@hal:~$ /usr/local/pgsql-7.2/bin/pg_resetxlog /usr/local/pgsql-7.2/data/
XLOG reset.
postgres72@hal:~$

Starting PG is now possible:

postgres72@hal:~$ /usr/local/pgsql-7.2/bin/pg_ctl -D /usr/local/pgsql-7.2/data/ start
postmaster successfully started
postgres72@hal:~$ DEBUG:  database system was shut down at 2004-08-19 10:18:02 CEST
DEBUG:  checkpoint record is at 0/8000010
DEBUG:  redo record is at 0/8000010; undo record is at 0/8000010; shutdown TRUE
DEBUG:  next transaction id: 141; next oid: 24748
DEBUG:  database system is ready

postgres72@hal:~$

My databases are not showing in the list anymore ...

postgres72@hal:~$ /usr/local/pgsql-7.2/bin/psql -l
         List of databases
   Name    |   Owner    | Encoding
-----------+------------+----------
 template0 | postgres72 | UNICODE
 template1 | postgres72 | UNICODE
(2 rows)

postgres72@hal:~$

I had one database called "backgammon", but although it's not listed I am actually able to psql into it:

postgres72@hal:~$ /usr/local/pgsql-7.2/bin/psql backgammon
Welcome to psql, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

backgammon=#

However, no relations can be found:

backgammon=# \d
No relations found.
backgammon=#
There may still be hope, though, because the schemas seem to be intact (although all the tuples are gone):

backgammon=# select * from bg_games;
 game_id | match_id | game_number | player_a_points | player_b_points | game_start | game_finish | crawford_game_p | game_winner | points_won | win_type | final_cube_value | bg_matches_updated_p
---------+----------+-------------+-----------------+-----------------+------------+-------------+-----------------+-------------+------------+----------+------------------+----------------------
(0 rows)

backgammon=#

Well, is there hope still, or should I "initdb" and get over it? 🤔