Forum OpenACS Q&A: connect by patch?

Collapse
Posted by ron shigeta on
Hi,  I am looking over the connect by code in the
openacs port, hoping to use it in a new project.
Could someone comment on how much using
a function costs in time compared to Oracle's
built in function, say?

thanx

Ron

Collapse
Posted by Dan Wickstrom on
The connect by as implemented in openacs would probably be quite slow in comparison.  Not only does using functions slow it down, but the functions use recursion to walk the tree structure.  So for a deeply nested tree, you are doing multiple queries for each entry in the table.

I've recently been looking at Celko's book titled "Sql for Smarties".  In his book he outlines a method refered to as the "visitor representation of trees".  The method seems very promising, and would provide performance roughly equivalent to oracle's connect by statement.  When we start the 4.0 porting, I plan on using this method for porting any connect-by statements that I encounter.  I've been meaning to try this method out on the current file-storage module, but I haven't had the time.

Collapse
Posted by miguel sofer on
I have been exploring a different alternative for tree structures; it would also be efficient for some other hierarchical structures. I started to write it down, then got diverted to other issues ...

In any case, as long as your structures satisfy:

  • no node can have more than (say) 4 million children
  • the structure may not be deeper than (say) 100 levels
this representation may suit your needs better than the one from 'SQL for Smarties': it is as fast as that one on lookup, much faster on insert and delete.

If you want to take a look at a (very) preliminary description I started writing back in may, download the preliminary paper and coding function .

Do not hesitate to contact me if you are interested in the stuff and think I may help implementing it.

Collapse
Posted by Ben Adida on
I'm happy to see all of these creative approaches to optimizing
tree structures, which are really important in general (and
promise to become even more important in a lot of projects I'm
working on).

One thing that is often forgotten is that the bboard module uses
a hierarchy, with sort keys. It uses the simple property that once
an item is created, it rarely (if ever) changes position in the tree
structure. Thus, storing the sort key directly in the table is the
right way to go. Think of it as a "cached" sort key in a sense.

Collapse
Posted by ron shigeta on
hi guys!

this is great feedback... want to do some ACS inspired
software engineering, but the tree calls are a big part of
the scheme and slow connect by was making me wary of Postgres.

I will look over your suggestions... hopefully this will work
out well.... now tell me is there an easy way to benchmark my
queries?

ron

Collapse
Posted by Dan Wickstrom on
If you want to get relative performance results, use explain from psql.

psql>explain select * from some_table where some_attribute = 'some_value';

Will give you a cost estimate for the query.  I've also seen references on the pg hackers list about turning on extended logging whereby you can obtain more detailed information about each query such as the number of physical reads from the disk and the time in msecs for executing the query.  Unfortunately, I can't recall how this is done.

Collapse
7: Nested set solution (response to 1)
Posted by Sebastian Skracic on
OK, a while ago I've read about nested set model in Celko's book. Unfortunately, I don't have the book at hand, but I was fiddling with it this morning and it seems that things work 😊

This sample table is taken from ACS 4.0. acs_object_types implements classic child-parent relationships through object_type and supertype columns. Here's the DDL:

create table acs_object_types (
        object_type     varchar(100) not null
                        constraint acs_object_types_pk primary key,
        supertype       varchar(100) constraint acs_object_types_supertype_fk
                        references acs_object_types (object_type),
        abstract_p      char(1) default 'f' not null
                        constraint acs_obj_types_abstract_p_ck
                        check (abstract_p in ('t', 'f')),
        pretty_name     varchar(100) not null
                        constraint acs_obj_types_pretty_name_un
                        unique,
        pretty_plural   varchar(100) not null
                        constraint acs_obj_types_pretty_plural_un
                        unique,
        table_name      varchar(30) not null
                        constraint acs_object_types_tbl_name_un unique,
        id_column       varchar(30) not null,
        package_name    varchar(30) not null
                        constraint acs_object_types_pkg_name_un unique,
        name_method     varchar(30),
        type_extension_table varchar(30),
        l_node          integer not null,
        r_node          integer not null
);
There is separated trigger/procedure for INSERT, UPDATE and DELETE. Let's review them one by one.

  • INSERT

    This is fairly trivial, the only thing we have to do is to locate the parent of newly inserted record, if such exists.

    create function nested_set_insert_node () returns opaque as '
    declare
        rightmost_node    acs_object_types.r_node%TYPE;
        parent_r_node     acs_object_types.r_node%TYPE;
    begin
        --  We differentiate two cases, depending on whether the newly
        --  inserted has parent or not.
        if new.supertype is null then
            --  New node is added at the end of the nested set list.
            select coalesce(max(r_node),0) into rightmost_node from acs_object_types;
            new.l_node := rightmost_node + 1;
            new.r_node := rightmost_node + 2;
        else
            --  New node is inserted at the right node of the parent,
            --  shifting everything two places to the right.
            select r_node into parent_r_node from acs_object_types
              where object_type = new.supertype;
            update acs_object_types
              set l_node = l_node + 2
              where l_node > parent_r_node;
            update acs_object_types
              set r_node = r_node + 2
              where r_node >= parent_r_node;
            new.l_node := parent_r_node;
            new.r_node := parent_r_node + 1;
        end if;
        return new;
    end;
    ' language 'plpgsql';
    
    create trigger nested_set_insert_tr before insert
      on acs_object_types for each row
      execute procedure nested_set_insert_node() ;
    
  • Collapse
    Posted by Sebastian Skracic on
  • DELETE

    Deleting is no harder than inserting, and I've added a check that prevents deletion of rows having children (although referential integrity constraints would have jumped in anyway).

    create function nested_set_delete_node () returns opaque as '
    begin       
        --  We must prevent deletion of non-leaf nodes.
        if old.r_node - old.l_node <> 1 then
            raise exception ''Type "%" cannot be deleted, l_node = %  r_node = %'',
              old.object_type, old.l_node, old.r_node;
            return NULL;
        end if; 
        update acs_object_types set
            l_node = l_node - 2
            where l_node > old.r_node;
        update acs_object_types set
            r_node = r_node - 2
            where r_node > old.r_node;
        return old;
    end;
    ' language 'plpgsql';
    
    create trigger nested_set_delete_tr before delete
      on acs_object_types for each row
      execute procedure nested_set_delete_node() ;
    
  • Collapse
    Posted by Sebastian Skracic on
  • UPDATE

    UPDATE is slightly complicated. I don't know why BEFORE trigger did not produce correct results, so I switched to AFTER trigger which rectified the problem. One interesting bit is that you cannot use new.l_node and new.r_node since they can be obsoleted by UPDATE statements issued by the trigger itself. Anyway, I didn't optimize anything so it might be worth tuning this little bit to reduce the amount of UPDATEs.

    create function nested_set_update_node () returns opaque as '
    declare
        rightmost_node    acs_object_types.r_node%TYPE;
        nested_set_width  acs_object_types.r_node%TYPE;
        shift_offset      acs_object_types.r_node%TYPE;
        parent_row        acs_object_types%ROWTYPE;
        this_row          acs_object_types%ROWTYPE;
       
    begin
        --  We cannot use new.l_node and new.r_node since they can
        --  be superseded by bunch of UPDATEs in this very trigger
        --  (this happens if the trigger fires on more than 1 row).
        --  So we retrieve the actual value of l_node and r_node with this:
        select * into this_row
            from acs_object_types
            where object_type = new.object_type;
        --  We have to rearrange nodes only if supertype has been changed.
        --  This leads to 3 cases:
        --    1.  supertype changes from NOT NULL to NULL
        --    2.  supertype changes from NULL to something NOT NULL
        --    3.  supertype changes from one NOT NULL value to another
        if old.supertype = new.supertype then
            return new;
        end if;
        if old.supertype is null and new.supertype is null then
            return new;
        end if;
        --  In any case we must shift entire subset out of its parent
        --  and put it to the right of the rightmost node, to prevent
        --  it from being overwritten by node adjustments.
        select max(r_node) into rightmost_node from acs_object_types;
        nested_set_width := this_row.r_node - this_row.l_node + 1;
        shift_offset := rightmost_node + 1 - this_row.l_node;
        --  Shift nested subset out of its parent.
        update acs_object_types
            set l_node = l_node + shift_offset,
                r_node = r_node + shift_offset
            where l_node >= this_row.l_node
              and r_node <= this_row.r_node;
        --  It''s good to know that the [this_row.l_node,this_row.r_node] interval
        --  is now not occupied by nodes.
        --
        --  Case 1)
        --
        if old.supertype is not null and new.supertype is null then
            --  Since we have already lifted our subset out of its
            --  position, we will simply renumber the nodes to fill the gaps.
            update acs_object_types
                set l_node = l_node - nested_set_width
                where l_node > this_row.r_node;
            update acs_object_types
                set r_node = r_node - nested_set_width
                where r_node > this_row.r_node;
            return new;
        end if;
        --
        --  Case 2) and 3)
        --
        if new.supertype is not null then
            --  The tricky part here is that we must pay attention whether
            --  the gap is to the left or to the right of inserting point.
            select * into parent_row
                from acs_object_types
                where object_type = new.supertype;
                parent_row.l_node, parent_row.r_node;
            --  Where will this subset be moved?
            if parent_row.r_node < this_row.l_node then
                --  Gap is to the right of inserting point
                update acs_object_types set
                    l_node = l_node + nested_set_width
                    where l_node > parent_row.r_node
                    and l_node < this_row.l_node ;
                update acs_object_types set
                    r_node = r_node + nested_set_width
                    where r_node >= parent_row.r_node
                    and r_node < this_row.l_node ;
                --  Place the subset under its new parent
                shift_offset := rightmost_node + 1 - parent_row.r_node;
            else
                --  The gap is to the LEFT of inserting point
                update acs_object_types set
                    l_node = l_node - nested_set_width
                    where l_node <= parent_row.l_node
                    and l_node > this_row.r_node ;
                update acs_object_types set
                    r_node = r_node - nested_set_width
                    where r_node < parent_row.l_node
                    and r_node > this_row.r_node ;
                shift_offset := rightmost_node + nested_set_width - parent_row.l_node;      
            end if;
            update acs_object_types set
                r_node = r_node - shift_offset,
                l_node = l_node - shift_offset
                where l_node > rightmost_node ;
            return new;
        end if;
        return new;
    end;
    ' language 'plpgsql';
    
    create trigger nested_set_update_tr after update
      on acs_object_types for each row
      execute procedure nested_set_update_node() ;