Forum OpenACS Improvement Proposals (TIPs): TIP #29: [Implemented] Tree_sortkey improvements for performance and concurrency on postgresql

TIP: 29
Title: Tree_sortkey improvements for performance and concurrency on postgresql
State: Draft
Type: Project
OpenACS-Version: 5.0
Vote: Pending
Created: Thursday, 13 November 2003

Abstract

The current implementation of tree_sortkey for acs_objects has 2 serious flaws. The first is that it is easy for a busy system to produce duplicate tree_sortkey (since the current implementation does table scans without locking to determine the next available sortkey). The second is that the performance can be abysmal for larger databases (for example creating a dotlrn community can take 2 minutes on a system with 200k objects). By changing the implementation the performance can be improved by an order of magnitude and the risk of a duplicate tree_sortkey being generated reduced greatly (although not eliminated entirely).

Current Implementation

The current implementation uses a single tree_sortkey column in acs_objects and has triggers to maintain the relation between context_id and the tree sortkey.

Here is the current insert trigger

create function acs_objects_insert_tr () returns opaque as '
declare
        v_parent_sk     varbit default null;
        v_max_value     integer;
begin
        if new.context_id is null then
            select max(tree_leaf_key_to_int(tree_sortkey)) into v_max_value
              from acs_objects
             where context_id is null;
        else
            select max(tree_leaf_key_to_int(tree_sortkey)) into v_max_value
              from acs_objects
             where context_id = new.context_id;

            select tree_sortkey into v_parent_sk
              from acs_objects
             where object_id = new.context_id;
        end if;

        new.tree_sortkey := tree_next_key(v_parent_sk, v_max_value);

        return new;
end;' language 'plpgsql';

The performance and concurrency issues are rooted in the queries to select the max(tree_leaf_key_to_int(tree_sortkey)) for a nodes siblings. It does no locking and in many cases can force a table scan on acs_objects. Since it does not lock the table (and there is no particular row to lock), two processes simulataneously inserting into the same parent can end up with the same v_max_value (openacs.org has 244 duplicated tree_sortkeys in acs_objects -- currently they are all leaf nodes so it has not caused a problem but creating children of any of these would create an malformed tree).

Forums

Forums uses the following code to generate it's tree sortkey

create function forums_mess_insert_tr returns opaque as '
declare
    v_max_child_sortkey             forums_forums.max_child_sortkey%TYPE;
    v_parent_sortkey                forums_messages.tree_sortkey%TYPE;
begin
    if new.parent_id is null
    then
        select '''', max_child_sortkey
        into v_parent_sortkey, v_max_child_sortkey
        from forums_forums
        where forum_id = new.forum_id
        for update;

        v_max_child_sortkey := tree_increment_key(v_max_child_sortkey);

        update forums_forums
        set max_child_sortkey = v_max_child_sortkey
        where forum_id = new.forum_id;
    else
        select coalesce(tree_sortkey, ''''), max_child_sortkey
        into v_parent_sortkey, v_max_child_sortkey
        from forums_messages
        where message_id = new.parent_id
        for update;

        v_max_child_sortkey := tree_increment_key(v_max_child_sortkey);

        update forums_messages
        set max_child_sortkey = v_max_child_sortkey
        where message_id = new.parent_id;
    end if;

    new.tree_sortkey := v_parent_sortkey || v_max_child_sortkey;
    return new;
end; 

It uses max_child_sortkey in forums_forums for assigning the initial tree_sortkey for a thread and uses max_child_sortkey from the parent for generating the childrens tree_sortkey; It also uses select ... for update to lock the row which should insure that no duplicates get created (there is however one dup in the database, two new threads have the same root tree_sortkey although this may be the result of a historical bug in 7.2, OpenACS, or happened via direct manipulation of the DB).

Proposed change

I would like to implement the scheme used for forums with some small modifications:

  • Add a max_child_sortkey to acs_objects

  • recreate the tree_sortkey indexes eliminating duplicate entries (disambiguating via context_id), and using object_id as the root node (A change from forums and would eliminate having to have a parent table which contained the max_child_sortkey)

  • add a unique not null constraint to tree_sortkey

  • replace the triggers which maintain tree_sortkey

Upgradability

This should be a relatively straightforward upgrade for 4.6 or 5.0 as this code is identical in both places. The only difficulty arises in dealing with duplicated tree_sortkeys with children (and in the case of acs_objects the context_id should allow the tree_sortkey to be corrected with no data lost or incorrect inheritance). And it should be entirely transparent to all existing code.

Notes

The same problem exists in a number of other places (most notably in the cr_items table) and should be fixed similiarly.

References

Jeff Davis

acs_object__new slowness... https://openacs.org/forums/message-view?message_id=142769

Patrick McNeill

tree_sortkey race condition, with fix https://openacs.org/forums/message-view?message_id=79102

Don Baccus

New implementation of tree_sortkey https://openacs.org/forums/message-view?message_id=27402

Jade Rubick

New documentation for tree_sortkey https://openacs.org/forums/message-view?message_id=112943

Sebastian Skracic

CONNECT BY solution https://openacs.org/forums/message-view?message_id=16799

I would like to call for 2 votes on this:

1. Is the plan acceptible?

2. Should it go in for 5.0?

Obviously I vote yes on both counts and given the short horizon of the 5.0 release I would like to get a quick decision on this.

I am planning to do the work if it's approved...

Collapse
Posted by Dave Bauer on
.
Seconded for both questions. Yes, it is acceptible and Yes, it should go in for 5.0.

Sadly nothing will be done til one week, due to the TIP process rules, as pointed out.

Well, if there are no NO votes I will go ahead and implement/test and just hold off for the week...
No vetos you wanted to say, not no votes ... confusing the hell out of me ;).

If you can do this for 5.0, great!

Approved on both questions. Thanks Jeff!
Collapse
8: Approved ... (response to 1)
Posted by Don Baccus on
Jeff - tree_sortkeys already have an index on them, your upgrade scripts should drop them and recreate them as unique indexes rather than simply add a unique constraint - we don't want two indexes on each of these!
Collapse
Posted by Lars Pind on
Approved.

As soon as everybody on the OCT has approved, we should be good to go, since there'd be no-one left to nay it.

Collapse
Posted by Roberto Mello on
It would be great to have this in 5.0.

Thanks Jeff,

-Roberto

Collapse
Posted by Tilmann Singer on
Approved.
Collapse
Posted by Jeff Davis on
I made the change in a clean install and tested it for concurrency problems and with 5 simultaneous scripts creating new users I was about to create 50k users (100k acs_objects) without any duplicated keys. There were no deadlocks or any appreciable decline in the speed of account creation versus doing just one thread creating accounts (i.e. one was taking ~100ms per acct and 5 were taking ~500ms per account for about the same total accounts/sec being created).

I did find a concurrency problem with the acs_objects_context_id_up_tr trigger (which maintains the acs_object_context_index table), which only manifested itself because of a second bug in the trigger. The plain old bug was simple...it said:

  if new.object_id = old.object_id and
     new.context_id = old.context_id and
     new.security_inherit_p = old.security_inherit_p then
    return new;
  end if;
but if new.context_id and old.context_id were null it would still evaluate the trigger code since new.context_id = old.context_id is false if both are null.

Changing it to this:

 
if new.object_id = old.object_id
     and ((new.context_id = old.context_id)
	  or (new.context_id is null and old.context_id is null))
     and new.security_inherit_p = old.security_inherit_p then
    return new;
end if;
fixes that. Although the concurrency problem still exists, it is far less likely that it would be encountered with this fix applied.
Can you elaborate on what the remaining concurrency problem is?
Its possible it's not really a concurrency bug and just a bug that manifested itself because unchanged null context_ids where falling through (I really did not investigate it once I realized it should not have been running at all in this case). I don't think thats the case though since running a single thread should have the same codepath and I didn't get the error there.

Anyway, here is the postgresql error log.....

2003-11-15 15:53:10 LOG:  query:
            select acs__add_user(
                '3516',
                'user',
                now(),
                null,
                '127.0.0.1',
                '9',
                'test-user10551',
                'test-10551@example.com',
                NULL,
                'user10551',
                'test',
                'F8DFBF53079A6306A7EBE11A36534549E09A70',
                'A315004DDE1F3CA754561EFD287B5E5F06B9E4',
                NULL,
                't',
                'approved'
            );
2003-11-15 15:53:10 ERROR:  Cannot insert a duplicate key into unique index acs_object_context_index_pk
2003-11-15 15:53:10 WARNING:  Error occurred while executing PL/pgSQL function acs_objects_context_id_up_tr
2003-11-15 15:53:10 WARNING:  line 28 at SQL statement
Collapse
Posted by Tom Jackson on

Thanks Jeff for doing this. This is one of those non-glamorous jobs that is probably also tedious, but I think it benefits everyone.

Approved. Thanks Jeff.
Thanks again Jeff. I just applied this fix and the others from upgrade5.0.0b1 to b2 to b3 to b4 to a OACS4.6.3 site running pg 7.3 with 200,000 objects. It seems to work fine on 4.6.3 and it was a huge improvement in performance!

This is a great benefit to the toolkit.

For anyone who wants to experiment with these fixes on 4.6.3 the upgrades scripts I used are here:

http://cvs.openacs.org/cvs/openacs-4/packages/acs-kernel/sql/postgresql/upgrade/Attic/