Forum OpenACS Q&A: Mysterious Crash, losing connection with Postgres

Hello Everyone,

I am currently working on a project that uses among others the forums package and search using the tsearch driver.

Although I have yet to test this with a fresh checkout or a tarball from downloads I would just like to ask if anybody has encountered anything similar with the two packages I've mentioned.

The problem occurs when I use the application with a database that has been restored from a pgdump. Everytime I try to reply to the first forum post (article) the following error occurs.

******** LOG ****************
[26/Nov/2004:15:31:35][14703.1108110256][-conn:project1::4] Debug: NO FULLQUERY FOR dbqd.forums.tcl.messages-procs.forum::message::do_notifications.get_message_tree --> using default SQL
[26/Nov/2004:15:31:35][14703.1108110256][-conn:project1::4] Notice: Querying '
select f1.message_id as ancestor_message_id
from forums_messages f1
where f1.tree_sortkey in (select tree_ancestor_keys(f2.tree_sortkey)
from forums_messages f2
where f2.message_id = '1644')
and forum_id = '1093';'
[26/Nov/2004:15:31:35][14703.1108110256][-conn:project1::4] Notice: Ns_PgExec: Trying to reopen database connection
[26/Nov/2004:15:31:35][14703.1108110256][-conn:project1::4] Notice: Opening project1 on localhost
[26/Nov/2004:15:31:35][14703.1108110256][-conn:project1::4] Error: Ns_PgOpenDb(postgres): Could not connect to localhost::project1: FATAL: the database system is starting up

[26/Nov/2004:15:31:35][14703.1108110256][-conn:project1::4] Notice: Ns_PgExec: In transaction, conn died, error returned
[26/Nov/2004:15:31:35][14703.1108110256][-conn:project1::4] Error: dbinit: error(localhost::project1,server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
): '
select f1.message_id as ancestor_message_id
from forums_messages f1
where f1.tree_sortkey in (select tree_ancestor_keys(f2.tree_sortkey)
from forums_messages f2
where f2.message_id = '1644')
and forum_id = '1093'
******** LOG ****************

The wierd part of this is that this only happens, at least for me, when I use a restored database. If I have a clean install of the web application, it does not occur.

I tried to comment out the query that's causing the probem but another issue occurs that causes the application to lose connection with postgres. This time the sql statement which performs indexing from the tsearch driver seems to be the culprit

**********LOG *************
'
[26/Nov/2004:15:50:26][15250.1114426288][-sched:27-] Debug: NO FULLQUERY FOR dbqd.tsearch2-driver.tcl.tsearch2-driver-procs.tsearch2::index.object_exists --> using default SQL
[26/Nov/2004:15:50:26][15250.1114426288][-sched:27-] Notice: Querying 'select 1 from txt where object_id='1679';'
[26/Nov/2004:15:50:26][15250.1114426288][-sched:27-] Notice: dbinit: sql(localhost::project1): 'select 1 from txt where object_id='1679''
[26/Nov/2004:15:50:26][15250.1114426288][-sched:27-] Debug: NO FULLQUERY FOR dbqd.tsearch2-driver.tcl.tsearch2-driver-procs.tsearch2::index.index --> using default SQL
[26/Nov/2004:15:50:26][15250.1114426288][-sched:27-] Notice: Querying 'insert into txt (object_id,fti,title_fti,description_fti) values ( '1679', to_tsvector('default','Re: Satellite data ''could save fish'' test test'), to_tsvector('default','Re: Satellite data ''could save fish'''), to_tsvector('default','Re: Satellite data ''could save fish'' test test'));'
[26/Nov/2004:15:50:26][15250.1114426288][-sched:27-] Notice: Ns_PgExec: Trying to reopen database connection
[26/Nov/2004:15:50:26][15250.1114426288][-sched:27-] Notice: Opening project1 on localhost
[26/Nov/2004:15:50:26][15250.1114426288][-sched:27-] Error: Ns_PgOpenDb(postgres): Could not connect to localhost::project1: FATAL: the database system is starting up

[26/Nov/2004:15:50:26][15250.1114426288][-sched:27-] Error: dbinit: error(localhost::project1,server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
): 'insert into txt (object_id,fti,title_fti,description_fti) values ( '1679', to_tsvector('default','Re: Satellite data ''could save fish'' test test'), to_tsvector('default','Re: Satellite data ''could save fish'''), to_tsvector('default','Re: Satellite data ''could save fish'' test test'))'
[26/Nov/2004:15:50:26][16065.1076324800][-main-] Notice: nsmain: AOLserver/4.0 starting
**********LOG *************

Any ideas ?

I am using AOLServer 4.0 and Postgres 7.5.
Note though that this also occurs on other versions of Postgres including 7.3 and 7.4

Thanks very much.

Hamilton G. Chua

Collapse
Posted by Dave Bauer on
Hamilton,

This is using tsearch2 which is quite different from tsearch.

Collapse
Posted by Jun Yamog on
Hi,

Both crashes are caused when restoring the db.  Not from a fresh install.  The tsearch2 problem can be easily resolved by droping the tsearch2 data model and restoring.  Use untsearch2.sql (remove the transaction block) and run tsearch2.sql again on pg contrib dir.

The other one happens on forums, well custom forums.  Has anyone exprienced this?

Here is the query, query above is the orig one.  Query below is a rewrite.

ctk=# explain    select f1.message_id as ancestor_message_id
    from forums_messages f1
    where f1.tree_sortkey in (select tree_ancestor_keys(f2.tree_sortkey)
                              from forums_messages f2
                              where f2.message_id = 1268350)
    and forum_id = 1204187;
ctk-# ctk-# ctk(# ctk(# ctk-#                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=5.69..10.98 rows=1 width=4)
  ->  Unique  (cost=5.69..5.69 rows=1 width=32)
        ->  Sort  (cost=5.69..5.69 rows=1 width=32)
              Sort Key: "IN_subquery".tree_ancestor_keys
              ->  Subquery Scan "IN_subquery"  (cost=0.00..5.68 rows=1 width=32)
                    ->  Index Scan using forums_messages_pk on forums_messages f2  (cost=0.00..5.67 rows=1 width=16)
                          Index Cond: (message_id = 1268350)
  ->  Index Scan using forums_mess_forum_sk_un on forums_messages f1  (cost=0.00..5.27 rows=1 width=20)
        Index Cond: ((f1.forum_id = 1204187) AND (f1.tree_sortkey = "outer".tree_ancestor_keys))
(9 rows)

ctk=# explain    select f1.message_id as ancestor_message_id
    from forums_messages f1,
        (select tree_ancestor_keys(tree_sortkey) as tree_sortkey from forums_messages where message_id = 1268350) parents
    where f1.tree_sortkey = parents.tree_sortkey
        and forum_id = 1204187;
ctk-# ctk-# ctk-# ctk-#                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.00..10.96 rows=1 width=4)
  ->  Subquery Scan parents  (cost=0.00..5.68 rows=1 width=32)
        ->  Index Scan using forums_messages_pk on forums_messages  (cost=0.00..5.67 rows=1 width=16)
              Index Cond: (message_id = 1268350)
  ->  Index Scan using forums_mess_forum_sk_un on forums_messages f1  (cost=0.00..5.27 rows=1 width=20)
        Index Cond: ((f1.forum_id = 1204187) AND (f1.tree_sortkey = "outer".tree_sortkey))
(6 rows)

BTW I excuted both queries on a freshly installed system.  Executing the first one on a restored system, gets you booted off psql.  Is the query rewrite ok?

How can tree_ancestor_keys placed on a query causes the pg server to disconnect?  Placing the tree_ancestor_keys outside the "in" does not boot you off.  I still can't understand the exact cause of it.