Forum OpenACS Q&A: connect by patch?

1: connect by patch?

Posted by ron shigeta on 10/11/00 05:55 PM

Hi, I am looking over the connect by code in the
openacs port, hoping to use it in a new project.
Could someone comment on how much using
a function costs in time compared to Oracle's
built in function, say?

thanx

Ron

2: Response to connect by patch? (response to 1)

Posted by Dan Wickstrom on 10/11/00 07:22 PM

The connect by as implemented in openacs would probably be quite slow in comparison. Not only does using functions slow it down, but the functions use recursion to walk the tree structure. So for a deeply nested tree, you are doing multiple queries for each entry in the table.

I've recently been looking at Celko's book titled "Sql for Smarties". In his book he outlines a method refered to as the "visitor representation of trees". The method seems very promising, and would provide performance roughly equivalent to oracle's connect by statement. When we start the 4.0 porting, I plan on using this method for porting any connect-by statements that I encounter. I've been meaning to try this method out on the current file-storage module, but I haven't had the time.

3: Response to connect by patch? (response to 1)

Posted by miguel sofer on 10/11/00 08:47 PM

I have been exploring a different alternative for tree structures; it would also be efficient for some other hierarchical structures. I started to write it down, then got diverted to other issues ...

In any case, as long as your structures satisfy:

no node can have more than (say) 4 million children
the structure may not be deeper than (say) 100 levels

this representation may suit your needs better than the one from 'SQL for Smarties': it is as fast as that one on lookup, much faster on insert and delete.

If you want to take a look at a (very) preliminary description I started writing back in may, download the preliminary paper and coding function .

Do not hesitate to contact me if you are interested in the stuff and think I may help implementing it.

4: Response to connect by patch? (response to 1)

Posted by Ben Adida on 10/11/00 08:52 PM

I'm happy to see all of these creative approaches to optimizing
tree structures, which are really important in general (and
promise to become even more important in a lot of projects I'm
working on).

One thing that is often forgotten is that the bboard module uses
a hierarchy, with sort keys. It uses the simple property that once
an item is created, it rarely (if ever) changes position in the tree
structure. Thus, storing the sort key directly in the table is the
right way to go. Think of it as a "cached" sort key in a sense.

5: Response to connect by patch? (response to 1)

Posted by ron shigeta on 10/12/00 09:04 PM

hi guys!

this is great feedback... want to do some ACS inspired
software engineering, but the tree calls are a big part of
the scheme and slow connect by was making me wary of Postgres.

I will look over your suggestions... hopefully this will work
out well.... now tell me is there an easy way to benchmark my
queries?

ron

6: Response to connect by patch? (response to 1)

Posted by Dan Wickstrom on 10/12/00 09:17 PM

If you want to get relative performance results, use explain from psql.

psql>explain select * from some_table where some_attribute = 'some_value';

Will give you a cost estimate for the query. I've also seen references on the pg hackers list about turning on extended logging whereby you can obtain more detailed information about each query such as the number of physical reads from the disk and the time in msecs for executing the query. Unfortunately, I can't recall how this is done.

7: Nested set solution (response to 1)

Posted by Sebastian Skracic on 11/01/00 03:03 PM

OK, a while ago I've read about nested set model in Celko's book. Unfortunately, I don't have the book at hand, but I was fiddling with it this morning and it seems that things work 😊

This sample table is taken from ACS 4.0. acs_object_types implements classic child-parent relationships through object_type and supertype columns. Here's the DDL:

create table acs_object_types (
        object_type     varchar(100) not null
                        constraint acs_object_types_pk primary key,
        supertype       varchar(100) constraint acs_object_types_supertype_fk
                        references acs_object_types (object_type),
        abstract_p      char(1) default 'f' not null
                        constraint acs_obj_types_abstract_p_ck
                        check (abstract_p in ('t', 'f')),
        pretty_name     varchar(100) not null
                        constraint acs_obj_types_pretty_name_un
                        unique,
        pretty_plural   varchar(100) not null
                        constraint acs_obj_types_pretty_plural_un
                        unique,
        table_name      varchar(30) not null
                        constraint acs_object_types_tbl_name_un unique,
        id_column       varchar(30) not null,
        package_name    varchar(30) not null
                        constraint acs_object_types_pkg_name_un unique,
        name_method     varchar(30),
        type_extension_table varchar(30),
        l_node          integer not null,
        r_node          integer not null
);

There is separated trigger/procedure for INSERT, UPDATE and DELETE. Let's review them one by one.

INSERT

This is fairly trivial, the only thing we have to do is to locate the parent of newly inserted record, if such exists.

create function nested_set_insert_node () returns opaque as '
declare
    rightmost_node    acs_object_types.r_node%TYPE;
    parent_r_node     acs_object_types.r_node%TYPE;
begin
    --  We differentiate two cases, depending on whether the newly
    --  inserted has parent or not.
    if new.supertype is null then
        --  New node is added at the end of the nested set list.
        select coalesce(max(r_node),0) into rightmost_node from acs_object_types;
        new.l_node := rightmost_node + 1;
        new.r_node := rightmost_node + 2;
    else
        --  New node is inserted at the right node of the parent,
        --  shifting everything two places to the right.
        select r_node into parent_r_node from acs_object_types
          where object_type = new.supertype;
        update acs_object_types
          set l_node = l_node + 2
          where l_node > parent_r_node;
        update acs_object_types
          set r_node = r_node + 2
          where r_node >= parent_r_node;
        new.l_node := parent_r_node;
        new.r_node := parent_r_node + 1;
    end if;
    return new;
end;
' language 'plpgsql';

create trigger nested_set_insert_tr before insert
  on acs_object_types for each row
  execute procedure nested_set_insert_node() ;

8: Nested set solution (cont.) (response to 1)

Posted by Sebastian Skracic on 11/01/00 03:07 PM

DELETE

Deleting is no harder than inserting, and I've added a check that prevents deletion of rows having children (although referential integrity constraints would have jumped in anyway).

create function nested_set_delete_node () returns opaque as '
begin       
    --  We must prevent deletion of non-leaf nodes.
    if old.r_node - old.l_node <> 1 then
        raise exception ''Type "%" cannot be deleted, l_node = %  r_node = %'',
          old.object_type, old.l_node, old.r_node;
        return NULL;
    end if; 
    update acs_object_types set
        l_node = l_node - 2
        where l_node > old.r_node;
    update acs_object_types set
        r_node = r_node - 2
        where r_node > old.r_node;
    return old;
end;
' language 'plpgsql';

create trigger nested_set_delete_tr before delete
  on acs_object_types for each row
  execute procedure nested_set_delete_node() ;

9: Nested set solution (cont.) (response to 1)

Posted by Sebastian Skracic on 11/01/00 03:19 PM

UPDATE

UPDATE is slightly complicated. I don't know why BEFORE trigger did not produce correct results, so I switched to AFTER trigger which rectified the problem. One interesting bit is that you cannot use new.l_node and new.r_node since they can be obsoleted by UPDATE statements issued by the trigger itself. Anyway, I didn't optimize anything so it might be worth tuning this little bit to reduce the amount of UPDATEs.

create function nested_set_update_node () returns opaque as '
declare
    rightmost_node    acs_object_types.r_node%TYPE;
    nested_set_width  acs_object_types.r_node%TYPE;
    shift_offset      acs_object_types.r_node%TYPE;
    parent_row        acs_object_types%ROWTYPE;
    this_row          acs_object_types%ROWTYPE;
   
begin
    --  We cannot use new.l_node and new.r_node since they can
    --  be superseded by bunch of UPDATEs in this very trigger
    --  (this happens if the trigger fires on more than 1 row).
    --  So we retrieve the actual value of l_node and r_node with this:
    select * into this_row
        from acs_object_types
        where object_type = new.object_type;
    --  We have to rearrange nodes only if supertype has been changed.
    --  This leads to 3 cases:
    --    1.  supertype changes from NOT NULL to NULL
    --    2.  supertype changes from NULL to something NOT NULL
    --    3.  supertype changes from one NOT NULL value to another
    if old.supertype = new.supertype then
        return new;
    end if;
    if old.supertype is null and new.supertype is null then
        return new;
    end if;
    --  In any case we must shift entire subset out of its parent
    --  and put it to the right of the rightmost node, to prevent
    --  it from being overwritten by node adjustments.
    select max(r_node) into rightmost_node from acs_object_types;
    nested_set_width := this_row.r_node - this_row.l_node + 1;
    shift_offset := rightmost_node + 1 - this_row.l_node;
    --  Shift nested subset out of its parent.
    update acs_object_types
        set l_node = l_node + shift_offset,
            r_node = r_node + shift_offset
        where l_node >= this_row.l_node
          and r_node <= this_row.r_node;
    --  It''s good to know that the [this_row.l_node,this_row.r_node] interval
    --  is now not occupied by nodes.
    --
    --  Case 1)
    --
    if old.supertype is not null and new.supertype is null then
        --  Since we have already lifted our subset out of its
        --  position, we will simply renumber the nodes to fill the gaps.
        update acs_object_types
            set l_node = l_node - nested_set_width
            where l_node > this_row.r_node;
        update acs_object_types
            set r_node = r_node - nested_set_width
            where r_node > this_row.r_node;
        return new;
    end if;
    --
    --  Case 2) and 3)
    --
    if new.supertype is not null then
        --  The tricky part here is that we must pay attention whether
        --  the gap is to the left or to the right of inserting point.
        select * into parent_row
            from acs_object_types
            where object_type = new.supertype;
            parent_row.l_node, parent_row.r_node;
        --  Where will this subset be moved?
        if parent_row.r_node < this_row.l_node then
            --  Gap is to the right of inserting point
            update acs_object_types set
                l_node = l_node + nested_set_width
                where l_node > parent_row.r_node
                and l_node < this_row.l_node ;
            update acs_object_types set
                r_node = r_node + nested_set_width
                where r_node >= parent_row.r_node
                and r_node < this_row.l_node ;
            --  Place the subset under its new parent
            shift_offset := rightmost_node + 1 - parent_row.r_node;
        else
            --  The gap is to the LEFT of inserting point
            update acs_object_types set
                l_node = l_node - nested_set_width
                where l_node <= parent_row.l_node
                and l_node > this_row.r_node ;
            update acs_object_types set
                r_node = r_node - nested_set_width
                where r_node < parent_row.l_node
                and r_node > this_row.r_node ;
            shift_offset := rightmost_node + nested_set_width - parent_row.l_node;      
        end if;
        update acs_object_types set
            r_node = r_node - shift_offset,
            l_node = l_node - shift_offset
            where l_node > rightmost_node ;
        return new;
    end if;
    return new;
end;
' language 'plpgsql';

create trigger nested_set_update_tr after update
  on acs_object_types for each row
  execute procedure nested_set_update_node() ;