Forum OpenACS Development: Weird error invoking a plpgsql function (content_revision__delete)

I need to use the API to delete a content_revision, and I'm getting a strange error from postgres:
openacs-4=# select content_revision__delete(2371);
ERROR:  parser: parse error at or near "select"
Can anyone try this out on their installation and let me know if they're able to delete a row from cr_revisions? This seems to be an error in invoking the function, rather than a syntax error inside the function - it looks to be ported correctly.


This is related to a known referential integrity bug in postgresql.  Vinod has been following up on this, but I don't think we have heard anything back yet?  I'm going to take a look at this, and see if I can rewrite it to avoid the pg bug until we can get a fix.
Thanks Dan, that would be great! While you're working on that file, you might want to paste in my rewrite of the get_number function, which wasn't working at all previously:
-- function get_number
create function content_revision__get_number (integer)
returns integer as '
  p_revision_id            alias for $1;  
  v_row_count              integer;
  rev_cur                  record;
  v_row_count := 0;
  for rev_cur in select
                   cr_revisions r, acs_objects o
                   item_id = (select item_id from cr_revisions 
                               where revision_id = p_revision_id)
                   o.object_id = r.revision_id
                 order by
    v_row_count := v_row_count + 1;
    if rev_cur.revision_id = p_revision_id then 
       return v_row_count;
    end if;
  end LOOP;

  return null; 
' language 'plpgsql';
Unfortunately, it doesn't look like we're going to get a fix anytime soon.

Here's the response that I got from Stephan Szabo:

> It should be fixed at some point.  I'm not 100% sure what's 
> going to be involved, I'm planning on looking at this for the
> deferred state anyway.  I was hoping that one of the hackers
> would read my proposed solution and tell me if they thought it
> was a good idea.
> However, this may actually technically be an error case if 
> the function runs as one statement.  I've got to figure out 
> the triggered data change stuff in SQL99. :(

Here's the thread where I described the bug and Stephan responded. Unfortunately, this is way over my head, so if someone with more experience understands what is going on, feel free to jump in!

I also found another (more minor) bug in pg 7.1.2 which I mentioned earlier to Dan. That has been fixed in 7.2 (in development) and will be fixed in 7.1.3, if they decide to put out another 7.1.x release. So, for the moment, we should just work around it.

I'll update cvs with the get_number fix.  As far as the content_revision__delete problem, I'll see if I can rewrite it to avoid the problem, since it doesn't sound like we'll get a pg fix any time soon.
I've added the fix for content_revision__get_number, and I've also added a work-around for the content_revision__delete problem.  Let me know if you have any problems with it.
Hi Dan,

I got your latest changes to content_revision__delete. At least I can call the function now, but I'm getting the following error:

[18/Jun/2001:10:36:35][21372.1635333][-conn96-] Error: dbinit: error(localhost::
openacs-4,ERROR:  triggered data change violation on relation "cr_item_publish_audit"
): '

select content_revision__delete('2655')

Is this the PG bug that you know about? It seems very strange that the trigger would be running, even when the revision I'm deleting is not the latest or live revision.


I'm not able to reproduce your problem.  Could you give me more information about how your calling the psql function.  Is it called from within a .tcl script, a transaction or a pl/pgsql function?  Could you post what's in cr_items and the cr_revisions tables?  Also, if you could create a test script that reproduces the problem that would be great.
The error in my previous message seems to be related to calling it from tcl within a db_transaction, because I just tried it from the psql command line and it worked. My tcl code implements a "revert" operation as follows:
    db_transaction {
	db_1row get_item_id ""

	db_dml set_live_revision ""
	set revision_list [db_list revisions_to_delete ""]

	foreach revision_id $revision_list {
	    db_exec_plsql delete_revision ""
So, given a revision_id that predates the current, live revision, the code makes it the live revision and deletes all those that follow it. Let me know if this helps you reproduce it.
Ok thanks.  I'll take a look at this tomorrow and see if I can reproduce it.  These data-change violations are due to some RI changes that were made for pg 7.1, but we should be able to find a work-around.
I had a chance to look at this earlier than I thought I would.  In postgresql a data-change violation is triggered when a column that is a foreign key is changed more than once in the course of a transaction.  In your example, the update causes the cr_items_publish_update_tr to insert a new row into the cr_item_publish_audit table, and the content_revision__delete method will cause the corresponding row to be deleted from the same table. All during the course of the same transaction.  This triggers the data-change violation.

So far the only work-around that I can think of that preserves the auditing functionality is to remove the foreign key constraints from all of the columns in the cr_item_publish_audit table.  Doing this, you avoid the data-change violation, and you still retain the auditing information.  This is not an ideal solution, but it's probably acceptable until we get a fix from the postgresql team.

Thank you for the clear explanation Dan.  I dropped all the constraints on cr_item_publish_audit, but then the trigger procedure started saying "ERROR:  Relation 264049 does not exist".  Since I had no idea how to fix that, I just gave up and dropped the trigger, and now everything works.  I wasn't planning to rely on the auditing stuff anyway so I guess it's okay.
I ran into the same problem when I was testing this.  The problem is that the pl/pgsql functions cache references to things like tables, foreign key triggers and other pl/pgsql functions.  So when you drop a function or a table, you need to track down all of the dependencies and recreate the objects that depended on the original object that you changed.  For intance, when I changed the cr_item_publish_audit table, I had to drop and recreate the cr_items_publish_update_tr trigger because it refered to the cr_item_publish_audit table.  I also had to drop and recreate the content_revision__delete method, because it also refered to the cr_item_publish_audit table.  I then had to drop and recreate the content_item__delete method. because it depended on the content_revision__delete method.  It's easy to see how this can get unmanagable very quickly.  If you need to drop and recreate some core piece of your db schema, it's usually easier to drop the whole db and recreate it from scratch with your new changes incorporated in the data-model.

It's very important to watch these dependencies when your're doing maintenance on a production site.  You might not have the luxury of taking the site down and reloading it from scratch, and you could end with something that is broken if you make a simple change and don't follow all of the dependencies.

This is a definite weakness for postgresql, but I think they have added it to their todo list, and they plan to resolve it in some future release.

How timely. I am trying to construct a regression test of an acs service (I'm helping out Rafael port acs-events). I can trigger the data change violation if, as Dan pointed out, in one transaction I insert and delete a row in a table whose primary key is a foreign key of another table. This happens even if the other table is not involved at all. Here's a simplified script that will reproduce the data change violation.
create table ut_time_intervals (
    interval_id         integer
                        primary key,
    start_date          timestamp,
    end_date            timestamp,
    constraint ut_time_interval_date_order_ck
    check(start_date <= end_date)

-- Need a source of primary keys
create sequence ut_timespan_seq;

-- Make a table that references the primary key of above
-- If this table is not created, no data-change violation is
-- triggered. Note that this table is not used by the functions below.
create table ut_timespans (
    timespan_id     integer not null,
    interval_id     integer
                    references ut_time_intervals on delete cascade

create  function ut_time_interval__new (timestamp,timestamp)
returns integer as '
        new__start_date  alias for $1; -- default null,
        new__end_date    alias for $2; -- default null
        v_interval_id    ut_time_intervals.interval_id%TYPE;
        select nextval(''ut_timespan_seq'') into v_interval_id;

        insert into ut_time_intervals
            (interval_id, start_date, end_date)
            (v_interval_id, new__start_date, new__end_date);

        return v_interval_id;

end;' language 'plpgsql';
-- end new;

create function ut_time_interval__delete (
        integer               -- in time_intervals.interval_id%TYPE
returns integer as '
        delete__interval_id     alias for $1;

        delete from ut_time_intervals
        where  interval_id = delete__interval_id;

        return 0;

end;' language 'plpgsql';
-- end delete;

create function simple_regression()
returns integer as '
        date1   timestamp := ''2001-01-01'';
        date2   timestamp := ''2001-01-02'';
        v_id    time_intervals.interval_id%TYPE;

        -- Do insertion into the table and deletion in same
        -- transaction to trigger a data change violation

	-- Create an entry
        v_id := ut_time_interval__new(date1,date2);

	-- Other regression tests presumably done here.

	-- Delete the entry.  Will trigger a data change violation
	-- if a table exists (i.e., ut_timespans) that references 
        -- the primary key
        PERFORM ut_time_interval__delete(v_id);

        return 0;

end;' language 'plpgsql';

select (case when simple_regression() = 0
                 'Regression test successful'
             end) as regression_result;

-- Clean up
drop table ut_timespans;
drop table ut_time_intervals;
drop sequence ut_timespan_seq;
drop function ut_time_interval__new(timestamp,timestamp);
drop function ut_time_interval__delete(integer);
drop function simple_regression();
I guess the point is that the "bug" can also affect regression testing, not only audit tables. Dan, is there a regression toolkit similar to utplsql in Oracle? Since a service does not have any "interface", the only way to test it is to actually construct a test. BTW, I am using PG 7.1.1.
I should have elaborated more in my previous post.  In your example, the foreign key constraint on ut_timespans also places referential integrity triggers on both the ut_time_intervals table and the ut_timespans table, so you'll get the data-change violation on either table if you change a row more than once inside of a transaction.  You can also get the error by any combination of inserts, updates or deletes.

I looked at the utplsql package when I first encountered it, and I noticed that it was released under an open-source licence, so it might be possible to port it to work with postgresql.  However that being said, I don't know that the effort would be worthwhile.  Most of test scripts that used the utplsql package were dead simple to convert.  If we wanted to develop a more elaborate regression test suite for openacs, then it might be worthwhile to port the utplsql package.

I'm not aware of regression test suites that work for postgresql, maybe someone else can jump in here and suggest one.

Given a plpgsql error that refers to a missing cached function or relation, is there a way to look up which function or relation is described by the id in the error message?
If you delete a relation, then the relation will no longer be in the system catalogs.

You can see how this works in a simple test case:

-- file tst.sql
create table test_ids (
       val   integer

create function test_id_drop () returns integer as '
        v_val   integer;
        select val into v_val from test_ids where val = 3;
        return null;
end;' language 'plpgsql';

openacs4=# i tst.sql
openacs4=# select oid from pg_class where relname = 'test_ids';
(1 row)

openacs4=# select test_id_drop();
(1 row)

openacs4=# drop table test_ids;
openacs4=# select test_id_drop();
ERROR:  Relation 249461 does not exist
openacs4=# select oid from pg_class where relname = 'test_ids';
(0 rows)


Although posting any more on this topic might be flogging a dead horse, I thought I'd just verify that another error I'm seeing is also due to the data-change violation. The reason I'm not quite sure is that I'm getting a different error message about a foreign key.
[05/Jul/2001:10:28:24][6730.24581][-conn3-] Notice: Querying '
select content_item__delete('2364');'
NOTICE:  Deleting associated workflows...
NOTICE:  Deleting symlinks...
NOTICE:  Unscheduling item...
NOTICE:  Deleting associated revisions...
NOTICE:  Deleting associated item templates...
NOTICE:  Deleting item relationships...
NOTICE:  Deleting child relationships...
NOTICE:  Deleting parent relationships...
NOTICE:  Deleting associated permissions...
NOTICE:  Deleting keyword associations...
NOTICE:  Deleting associated comments...
NOTICE:  identifier "journal_entry__delete_for_object" will be truncated to "journal_entry__delete_for_objec"
NOTICE:  Deleting content item...
[05/Jul/2001:10:28:24][6730.24581][-conn3-] Error: Ns_PgExec: result
status: 7 message: ERROR:  cr_items_latest_fk referential integrity
violation - key referenced from cr_items not found in cr_revisions
The reason I think this is related to the above discussion? Looking at the code for content_item__delete, I see it calls content_revision__delete in a loop. Each of these calls potentially updates the item's "latest_revision" foreign key. I don't know why the error doesn't happen until you try to delete the item, but there you go.

I implemented a successful workaround, which is to set latest_revision and live_revision to null in a separate transaction, and then calling content_item__delete.

Is there any way to work around this problem in the definition of content_item__delete, or are we stuck waiting for the bug to get fixed in postgresql?

The data-change violation and the problem that you're seeing here are both related to problems with referential integrity triggers in pg.  In this particular case, I think content_item__delete can be rewritten.  There really is no need to update live_revision and latest_revision when deleting the content_item, so I think that another function, that avoids this problem, can be written to delete the content revisions.  I'll give it a shot and let you know if I get something that works.
Ok Luke, try this:

create function content_item__delete (integer)
returns integer as '
  delete__item_id                alias for $1;  
  v_wf_cases_val                 record;
  v_symlink_val                  record;
  v_revision_val                 record;
  v_rel_val                      record;
  v_rec                          record;

  raise NOTICE ''Deleting associated workflows...'';
  -- 1) delete all workflow cases associated with this item
  for v_wf_cases_val in select
                          object_id = delete__item_id 
    PERFORM workflow_case__delete(v_wf_cases_val.case_id);
  end loop;

  raise NOTICE ''Deleting symlinks...'';
  -- 2) delete all symlinks to this item
  for v_symlink_val in select 
                         target_id = delete__item_id 
    PERFORM content_symlink__delete(v_symlink_val.symlink_id);
  end loop;

  raise NOTICE ''Unscheduling item...'';
  delete from cr_release_periods
    where item_id = delete__item_id;

  raise NOTICE ''Deleting associated revisions...'';
  -- 3) delete all revisions of this item
  delete from cr_item_publish_audit
    where item_id = delete__item_id;

  for v_rec in select revision_id
               from cr_revisions 
               where item_id = delete__item_id 
        PERFORM acs_object__delete(v_rec.revision_id);
  end LOOP;

  raise NOTICE ''Deleting associated item templates...'';
  -- 4) unregister all templates to this item
  delete from cr_item_template_map
    where item_id = delete__item_id; 

  raise NOTICE ''Deleting item relationships...'';
  -- Delete all relations on this item
  for v_rel_val in select
                     item_id = delete__item_id
                     related_object_id = delete__item_id 
    PERFORM acs_rel__delete(v_rel_val.rel_id);
  end loop;  

  raise NOTICE ''Deleting child relationships...'';
  for v_rel_val in select
                     child_id = delete__item_id 
    PERFORM acs_rel__delete(v_rel_val.rel_id);
  end loop;  

  raise NOTICE ''Deleting parent relationships...'';
  for v_rel_val in select
                     rel_id, child_id
                     parent_id = delete__item_id 
    PERFORM acs_rel__delete(v_rel_val.rel_id);
    PERFORM content_item__delete(v_rel_val.child_id);
  end loop;  

  raise NOTICE ''Deleting associated permissions...'';
  -- 5) delete associated permissions
  delete from acs_permissions
    where object_id = delete__item_id;

  raise NOTICE ''Deleting keyword associations...'';
  -- 6) delete keyword associations
  delete from cr_item_keyword_map
    where item_id = delete__item_id;

  raise NOTICE ''Deleting associated comments...'';
  -- 7) delete associated comments
  PERFORM journal_entry__delete_for_object(delete__item_id);

  -- context_id debugging loop
  --for v_error_val in c_error_cur loop
  --  raise NOTICE ''ID='' || v_error_val.object_id || '' TYPE='' 
  --    || v_error_val.object_type);
  --end loop;

  raise NOTICE ''Deleting content item...'';
  PERFORM acs_object__delete(delete__item_id);

  return 0; 
end;' language 'plpgsql';