Forum OpenACS Development: CONNECT BY solution

Collapse
Posted by Sebastian Skracic on
As we all know, Oracle's CONNECT BY clause enables us to traverse data in simple hierarchies. In its heart, CONNECT BY establishes parent-child relationship between two rows based on specified criterion. To be able to appropriately use CONNECT BY, our data model must provide 'child' and 'parent' slots:
create table acs_object_types (
        object_type     varchar(100) not null
                        constraint acs_object_types_pk primary key,
        supertype       varchar(100) constraint
acs_object_types_supertype_fk
                        references acs_object_types (object_type),
        ...
);
Suppose we have hiearchy defined with these entries:
   object_type   | supertype
-----------------+------------
 acs_object      |
 relationship    | acs_object
 party           | acs_object
 person          | party
 user            | person
 group           | party
 membership_rel  | acs_object
 composition_rel | acs_object
 journal_entry   | acs_object
 site_node       | acs_object
Or, presented as hierarchy tree:
  • acs_object
    • relationship
    • party
      • person
        • user
      • group
    • membership_rel
    • composition_rel
    • journal_entry
    • site_node
In Oracle, querying for all subtypes of 'party' object_type is performed with:
  select
    object_type
  from
    acs_object_types
  start with
    object_type = 'party'
  connect by
    object_type = prior supertype;
Shortly, we denote root of subtree we want to take out of hierarchy with START WITH clause, and criterion which maps child row to its parent with CONNECT BY clause.

Let's build the simple view which will hold all type-supertype mappings, regardless of number of generations involved:

  create or replace view acs_object_type_supertype_map
  as select ot.object_type, ota.object_type as ancestor_type
     from acs_object_types ot, acs_object_types ota
     where ota.object_type in (select object_type
                               from acs_object_types
                               start with object_type = ot.supertype
                               connect by object_type = prior
supertype);

  select object_type, ancestor_type
    from acs_object_type_supertype_map
    order by object_type, ancestor_type;

   object_type   |  ancestor_type
-----------------+-----------------
 acs_object      | acs_object
 composition_rel | acs_object
 composition_rel | composition_rel
 group           | acs_object
 group           | group
 group           | party
 journal_entry   | acs_object
 journal_entry   | journal_entry
 membership_rel  | acs_object
 membership_rel  | membership_rel
 party           | acs_object
 party           | party
 person          | acs_object
 person          | party
 person          | person
 relationship    | acs_object
 relationship    | relationship
 site_node       | acs_object
 site_node       | site_node
 user            | acs_object
 user            | party
 user            | person
 user            | user
(23 rows)
Detailed explanation of handling trees in Oracle is available at Trees in Oracle SQL chapter in SQL for Web Nerds.

Nested set model

In contrast to parent-child model of representing trees, nested set model (as described in Celko's SQL for smarties), represents hierarchy by assigning two numbers (left and right node) to each element. Parent elements simply nests its children which can further have its own children etc:
create table acs_object_types (
        object_type     varchar(100) not null
                        constraint acs_object_types_pk primary key,
        l_node          integer not null,
        r_node          integer not null,
        ...
);
Previously shown hierarchy can be represented with following entries in nested set model:
   object_type   | l_node | r_node
-----------------+--------+--------
 acs_object      |      1 |     20
 relationship    |      2 |      3
 party           |      4 |     11
 person          |      5 |      8
 user            |      6 |      7
 group           |      9 |     10
 membership_rel  |     12 |     13
 composition_rel |     14 |     15
 journal_entry   |     16 |     17
 site_node       |     18 |     19
Or illustrated:
1 2    3  4 5 6    7 8 9    10 11 12  13  14  15 16  17 18   19 20
| |    |  | | |    | | |     |  | |    |  |    | |    |  |    |  |
| relnshp | | +user+ | +group+  | membrel comp_r jrn_ent sit_nd  |
|         | +-person-+          |                                |
|         +--------party--------+                                |
+---------------------------acs_object---------------------------+
Searching for all subtypes of 'party' object_type can be performed by:
  select
    object_type
  from
    acs_object_types
  where
    l_node > 4
    and r_node < 11;

 object_type
-------------
 person
 user
 group
Similarly, one can search for all supertypes of 'user' object_type:
  select
    object_type
  from
    acs_object_types
  where
    l_node < 6
    and r_node > 7;

 object_type
-------------
 person
 party
 acs_object
Again, building parent-child mappings spanning multiple generations is nothing more difficult than with Oracle-specific CONNECT BY:
  create view acs_object_type_supertype_map
  as select ot.object_type, ota.object_type as ancestor_type
     from acs_object_types ot, acs_object_types ota
     where ota.object_type in (select object_type
                               from acs_object_types
                               where l_node <= ot.l_node
                               and r_node >= ot.r_node );

Triggers

Fields l_node and r_node are being maintained by triggers firing on all write operations to acs_object_types table.

These triggers are written in PostgreSQL's pltcl procedural language, so that they can be reused for any table involving parent-child relationships. They can also be adapted for other databases, except that Oracle will barf on you with famous "table is mutating" error because you want to manipulate rows of the same table the trigger is acting on.

  create trigger acs_obj_types_nested_set_ins_tr before insert
    on acs_object_types for each row
    execute procedure
    nested_set_insert_node('acs_object_types', 'object_type',
'supertype');

  create trigger acs_obj_types_nested_set_upd_tr after update
    on acs_object_types for each row
    execute procedure
    nested_set_update_node('acs_object_types', 'object_type',
'supertype') ;

  create trigger acs_obj_types_nested_set_del_tr before delete
    on acs_object_types for each row
    execute procedure
    nested_set_delete_node('acs_object_types', 'object_type',
'supertype') ;
Each trigger expects three arguments: table name and child and parent field column names.
  • INSERT trigger
This is quite simple; we check whether the parent of newly inserted row is NULL (so that we can put this element on the top of the hierarchy) or not (so we must locate its parent and insert it at appropriate place). In any case, left and right nodes are renumbered since there must be no gaps in numbering.
create function nested_set_insert_node () returns opaque as '
    set TABLE $1
    set COLUMN $2
    set PARENT_COLUMN $3
    if {![info exists NEW($PARENT_COLUMN)]} {
        #  New node is added at the end of the nested set list.
        spi_exec "select coalesce(max(r_node),0) as rightmost_node
from $TABLE"
        set NEW(l_node) [expr $rightmost_node + 1]
        set NEW(r_node) [expr $rightmost_node + 2]
    } else {
        spi_exec "select r_node as parent_r_node from $TABLE
          where $COLUMN = ''[quote $NEW($PARENT_COLUMN)]''"
        spi_exec "update $TABLE set l_node = l_node + 2
          where l_node > $parent_r_node"
        spi_exec "update $TABLE set r_node = r_node + 2
          where r_node >= $parent_r_node"

        set NEW(l_node) $parent_r_node
        set NEW(r_node) [expr $parent_r_node + 1]
    }
    return [array get NEW]
' language 'pltcl';
  • UPDATE trigger
create function nested_set_update_node () returns opaque as '
    set TABLE $1
    set COLUMN $2
    set PARENT_COLUMN $3
    #  We cannot use new.l_node and new.r_node since they can
    #  be superseded by bunch of UPDATEs in this very trigger
    #  (this happens if the trigger fires on more than 1 row).
    #  So we retrieve the actual value of l_node and r_node with this:
    spi_exec "select l_node as curr_l_node, r_node as curr_r_node
        from $TABLE
        where $COLUMN = ''[quote $NEW($COLUMN)]''"
    #  We have to rearrange nodes only if supertype has been changed.
    #  There are two main code branches:
    #    1.  supertype changes from NOT NULL to NULL
    #    2.  supertype changes from something to NOT NULL
    if {![info exists OLD($PARENT_COLUMN)]  
        &&  ![info exists NEW($PARENT_COLUMN)]} {
        return "OK"
    }
    if {[info exists OLD($PARENT_COLUMN)] 
        &&  [info exists NEW($PARENT_COLUMN)] 
        &&  ![string compare $OLD($PARENT_COLUMN)
$NEW($PARENT_COLUMN)]} {
        return "OK"
    }
    #  In any case we must shift entire subset out of its parent
    #  and put it to the right of the rightmost node, to prevent
    #  it from being overwritten by node adjustments.
    spi_exec "select max(r_node) as rightmost_node from $TABLE"
    set nested_set_width [expr $curr_r_node - $curr_l_node + 1]
    set shift_offset [expr $rightmost_node - $curr_l_node + 1]
    #  Shift nested subset out of its parent.
    spi_exec "update $TABLE
        set l_node = l_node + $shift_offset,
            r_node = r_node + $shift_offset
        where l_node >= $curr_l_node
          and r_node <= $curr_r_node"
    #  It''s good to know that the [$curr_l_node,$curr_r_node]
interval
    #  is now not occupied by nodes.
    #
    #  Case 1)
    #
    if {[info exists OLD($PARENT_COLUMN)]  
        &&  ![info exists NEW($PARENT_COLUMN)]} {
        #  Since we have already lifted our subset out of its
        #  position, we will simply renumber the nodes to fill the
gaps.
        spi_exec "update $TABLE
            set l_node = l_node - $nested_set_width
            where l_node > $curr_r_node"
        spi_exec "update $TABLE
            set r_node = r_node - $nested_set_width
            where r_node > $curr_r_node"
        return "OK"
    }
    #
    #  Case 2)
    #
    if {[info exists NEW($PARENT_COLUMN)]} {
        #  The tricky part here is that we must pay attention whether
        #  the gap is to the left or to the right of inserting point.
        spi_exec "select l_node as parent_l_node, r_node as
parent_r_node
            from $TABLE
            where $COLUMN = ''[quote $NEW($PARENT_COLUMN)]''"
        #  Where will this subset be moved?
        if {$parent_r_node < $curr_l_node} {
            #  Gap is to the right of inserting point
            spi_exec "update $TABLE set
                l_node = l_node + $nested_set_width
                where l_node > $parent_r_node
                and l_node < $curr_l_node"
            spi_exec "update $TABLE set
                r_node = r_node + $nested_set_width
                where r_node >= $parent_r_node
                and r_node < $curr_l_node"
            #  Place the subset under its new parent
            set shift_offset [expr $rightmost_node - $parent_r_node +
1]
        } else {
            #  The gap is to the LEFT of inserting point
            spi_exec "update $TABLE set
                l_node = l_node - $nested_set_width
                where l_node <= $parent_l_node
                and l_node > $curr_r_node"
            spi_exec "update $TABLE set
                r_node = r_node - $nested_set_width
                where r_node < $parent_l_node
                and r_node > $curr_r_node"
            set shift_offset [expr $rightmost_node + $nested_set_width
- $parent_l_node]
        }
        spi_exec "update $TABLE set
            r_node = r_node - $shift_offset,
            l_node = l_node - $shift_offset
            where l_node > $rightmost_node"
        return "OK"
    }
    return "OK"
' language 'pltcl';
  • DELETE trigger
create function nested_set_delete_node () returns opaque as '
    set TABLE $1
    set COLUMN $2
    set PARENT_COLUMN $3
    #  We must prevent deletion of non-leaf nodes.
    if {[expr $OLD(r_node) - $OLD(l_node)] != 1} {
        elog ERROR "Type ''$OLD($COLUMN)'' cannot be deleted, l_node =
$OLD(l_node)  r_node = $OLD(r_node)"
        return "SKIP"
    }
    spi_exec "update $TABLE set
        l_node = l_node - 2
        where l_node > $OLD(r_node)"
    spi_exec "update $TABLE set
        r_node = r_node - 2
        where r_node > $OLD(r_node)"
    return "OK"
' language 'pltcl';
Collapse
2: Multiple parents (response to 1)
Posted by Sebastian Skracic on
So far, everything worked as expected until I've stumbled upon acs-permissions-create.sql which defines:
create table acs_privileges (
        privilege       varchar(100) not null constraint acs_privileges_pk
                        primary key,
        pretty_name     varchar(100),
        pretty_plural   varchar(100)
);

create table acs_privilege_hierarchy (
        privilege       not null constraint acs_priv_hier_priv_fk
                        references acs_privileges (privilege),
        child_privilege not null constraint acs_priv_hier_child_priv_fk
                        references acs_privileges (privilege),
        constraint acs_privilege_hierarchy_pk
        primary key (privilege, child_privilege)
);

create or replace view acs_privilege_descendant_map
as select p1.privilege, p2.privilege as descendant
   from acs_privileges p1, acs_privileges p2
   where p2.privilege in (select child_privilege
                          from acs_privilege_hierarchy
                          start with privilege = p1.privilege
                          connect by prior child_privilege = privilege)
   or p2.privilege = p1.privilege;
Gotcha! Nested set model (as described above) won't work in this case. Nested sets represent flattened hierarchy tree where every item is supposed to appear exactly once.

Back to the privilege hierarchy: each privilege can have arbitrary number of child privileges, but some privileges can be child of (i.e. implied by) more than one parent privileges. For example, 'admin' implies 'read', 'write', 'delete', ... but it makes sense that 'moderate' implies 'read' as well:

  select privilege, child_privilege
    from acs_privilege_hierarchy
    order by privilege, child_privilege;

 privilege | child_privilege
-----------+-----------------
 admin     | create
 admin     | delete
 admin     | read
 admin     | write
 all       | admin
 all       | moderate
 moderate  | read
 moderate  | write
 read      | browse
(9 rows)
The hierarchy tree (actually this is not a tree any more) generated with Oracle's CONNECT BY will look like:
  • all
    • admin
      • create
      • delete
      • read
        • browse
      • write
    • moderate
      • read
        • browse
      • write
What we see here is that some subhierarchies are appearing multiple times in this quasi-tree. For example, 'read' privilege implies 'browse' child privilege, but it has two parent privileges: 'moderate' and 'admin'. Every time the 'read' appears as a child of either, it must carry the whole subtree along.

So, simple nested set model won't help us here because an element can appear multiple times in hierarchy.

Extended nested set model

After fiddling with this a little more, I've come up with the solution based on slightly extended nested set data model. In addition to acs_privilege and acs_privilege_hierarchy, I've created additional table to store sets themselves:
create table acs_privilege_hierarchy_index (
        privilege       varchar(100) not null
                        constraint acs_priv_hier_ndx_priv_fk
                        references acs_privileges (privilege),
        child_privilege varchar(100) not null
                        constraint acs_priv_hier_ndx_child_priv_fk
                        references acs_privileges (privilege),
        l_node          integer not null,
        r_node          integer not null
);
This table is used to translate quasi-hierarchy tree shown above into the nested set model. Why couldn't we simply put l_node and r_node in acs_privilege_hierarchy? Because a single entry in acs_privilege_hierarchy could generate many elements in quasi-hierarchy. In other words, (privilege, child_privilege) pair cannot uniquely describe position of particular element in quasi-hierarchical tree listing, so there is no point of attaching (l_node, r_node) attribute to it.

As in the "normal" nested set model, acs_privilege_hierarchy_index is being maintained with triggers for INSERT and DELETE on acs_privilege_hierarchy. acs_privilege_descendant_map can be defined through:


  create view acs_privilege_descendant_map
  as select p1.privilege, p2.privilege as descendant
     from
       acs_privileges p1,
       acs_privileges p2
     where p2.privilege in
                         (select h2.child_privilege
                            from
                                acs_privilege_hierarchy_index h1,
                                acs_privilege_hierarchy_index h2
                            where
                              h1.child_privilege = p1.privilege
                              and h1.l_node <= h2.l_node
                              and h1.r_node >= h2.r_node) ;

  select privilege, descendant
    from acs_privilege_descendant_map
    order by privilege, descendant;

 privilege | descendant
-----------+------------
 admin     | admin
 admin     | browse
 admin     | create
 admin     | delete
 admin     | read
 admin     | write
 all       | admin
 all       | all
 all       | browse
 all       | create
 all       | delete
 all       | moderate
 all       | read
 all       | write
 browse    | browse
 create    | create
 delete    | delete
 moderate  | browse
 moderate  | moderate
 moderate  | read
 moderate  | write
 read      | browse
 read      | read
 write     | write
(24 rows)

Triggers

I'm not going to enter into more details right now, I'll simply post the trigger code:
create trigger acs_privil_hier_nested_set_in_tr before insert
  on acs_privilege_hierarchy for each row
  execute procedure
  multiple_nested_set_insert_node('acs_privilege_hierarchy_index',
    'child_privilege', 'privilege');


create trigger acs_privil_hier_nested_set_del_tr after delete
  on acs_privilege_hierarchy for each row
  execute procedure
  multiple_nested_set_delete_node('acs_privilege_hierarchy_index',
    'child_privilege', 'privilege') ;



create function multiple_nested_set_insert_node () returns opaque as '

    set TABLE           $1
    set COLUMN          $2
    set PARENT_COLUMN   $3

    set CHILD_ID $NEW($COLUMN)
    set PARENT_ID $NEW($PARENT_COLUMN)

    set subset_width 2
    set garbage_collect_p 0

    #  Do we have parent already?

    spi_exec "select count(*) as parent_p from $TABLE
        where $COLUMN = ''[quote $PARENT_ID]''"
    if {!$parent_p} {
        spi_exec "select coalesce(max(r_node),0) as rightmost_node from $TABLE"
        spi_exec "insert into $TABLE
                         ($COLUMN, $PARENT_COLUMN, l_node, r_node)
                  values (''[quote $PARENT_ID]'', ''[quote $PARENT_ID]'',
                          $rightmost_node + 1, $rightmost_node + 2)"
    }

    #  Does the children stand on its own in hierarchy?  i.e. is this
    #  its first designated parent?

    spi_exec "select l_node as child_l_node, r_node as child_r_node
      from $TABLE
      where $COLUMN = $PARENT_COLUMN and $COLUMN = ''[quote $CHILD_ID]''"

    if {[info exists child_l_node]} {

        #  We should also garbage collect the space this subset occupied.
        set garbage_collect_p 1
        set subset_width [expr $child_r_node - $child_l_node + 1]

    } else {

        #    However, our child could still have had its own children,
        #  regardless of standing on top of hierarchy tree.
        #
        spi_exec "select l_node as child_l_node, r_node as child_r_node
            from $TABLE where $COLUMN = ''[quote $CHILD_ID]''
                          and $COLUMN <> $PARENT_COLUMN " {

            #  Child patterns should be all the same, so exit on first found
            set subset_width [expr $child_r_node - $child_l_node + 1]
            break
        }
    }
    if {![info exists child_l_node]} {
    }

    #  Inserting new CHILD node.  We need to lookup its parent and
    #  then insert in appropriate position.
    #     Attention!  There might be multiple parent entries!
    #  Perform this stuff for each parent entry found.
    set i 0
    set adjustment 0
    spi_exec "select r_node as parent_r_node
        from $TABLE where $COLUMN = ''[quote $PARENT_ID]''
        order by r_node" {

        incr parent_r_node $adjustment

        incr i
        if {$garbage_collect_p} {
            set garbage_collect_p 0
            spi_exec "update $TABLE
              set $PARENT_COLUMN = ''[quote $PARENT_ID]''
              where $COLUMN = $PARENT_COLUMN and $COLUMN = ''[quote $CHILD_ID]''"
            spi_exec "update $TABLE set l_node = l_node + $subset_width
              where l_node > $parent_r_node"
            spi_exec "update $TABLE set r_node = r_node + $subset_width
              where r_node >= $parent_r_node"
            set offset [expr $parent_r_node - $child_l_node]
            spi_exec "update $TABLE set l_node = l_node + $offset
              where l_node >= $child_l_node
                and l_node < $child_r_node"
            spi_exec "update $TABLE set r_node = r_node + $offset
              where r_node > $child_l_node
                and r_node <= $child_r_node"
            #  We now must renumber nodes to fill in the gaps.
            #  Gap is bounded by (child_l_node, child_r_node).
            spi_exec "update $TABLE set l_node = l_node - $subset_width
              where l_node >= $child_l_node"
            spi_exec "update $TABLE set r_node = r_node - $subset_width
              where r_node > $child_l_node"
        } else {
            #  Allocate $subset_width nodes
            spi_exec "update $TABLE set l_node = l_node + $subset_width
              where l_node > $parent_r_node"
            spi_exec "update $TABLE set r_node = r_node + $subset_width
              where r_node >= $parent_r_node"

            spi_exec "insert into $TABLE
              ($COLUMN, $PARENT_COLUMN, l_node, r_node)
              values
              (''[quote $CHILD_ID]'', ''[quote $PARENT_ID]'',
               $parent_r_node, $parent_r_node + $subset_width - 1)"

            incr adjustment $subset_width

            #  Now copy grandchildren (if exist)
            if {$subset_width > 2} {
                #  Careful not to locate the row just inserted!
                spi_exec "select l_node as child_l_node, r_node as child_r_node
                  from $TABLE
                  where $COLUMN = ''[quote $CHILD_ID]''
                  and l_node <> $parent_r_node" {
                    break
                }
                set offset [expr $parent_r_node - $child_l_node]
                spi_exec "insert into $TABLE
                  ($COLUMN, $PARENT_COLUMN, l_node, r_node)
                  select
                    $COLUMN, $PARENT_COLUMN, l_node + $offset, r_node + $offset
                  from $TABLE
                  where
                    l_node > $child_l_node
                    and r_node < $child_r_node"
            }
        }
    }


    return OK

' language 'pltcl';





create function multiple_nested_set_delete_node () returns opaque as '

    set TABLE           $1
    set COLUMN          $2
    set PARENT_COLUMN   $3

    set CHILD_ID $OLD($COLUMN)
    set PARENT_ID $OLD($PARENT_COLUMN)

    #  Find all nested subsets carrying ($CHILD_ID, $PARENT_ID) mappings

    set more_rows_p 1
    set move_to_the_top_p 1
    while {$more_rows_p} {
        set more_rows_p 0
        spi_exec "select l_node as subset_l_node, r_node as subset_r_node
            from $TABLE
            where $COLUMN = ''[quote $CHILD_ID]''
              and $PARENT_COLUMN = ''[quote $PARENT_ID]''" {

              set more_rows_p 1
              set subset_width [expr $subset_r_node - $subset_l_node + 1]
              break
        }
        if {$more_rows_p} {
            if {$move_to_the_top_p} {
                #
                #    Why this check?
                #
                #    Because particular subset can appear several times in
                #  nested set hierarchy, all of these subsets being
                #  identical.  We only need to preserve one of these,
                #  say first one found.  All others will be garbage
                #  collected.
                #
                #    There is an exception:
                #
                #    Check whether $CHILD_ID has any other parent left.
                #  If yes, we simply delete the first rec found as all others
                #  (since remaining parentage line(s) will take care of
                #  child''s children).  If not, move to the top of hierarchy
                #  tree, but update the $PARENT_COLUMN field in index entry to
                #  reflect these (i.e. point to child itself).
                spi_exec "select count(*) as other_parents_p
                  from $TABLE
                  where $COLUMN = ''[quote $CHILD_ID]''
                  and $PARENT_COLUMN <> ''[quote $PARENT_ID]''"
                if {$other_parents_p} {
                    set move_to_the_top_p 0
                }
            }
            if {$move_to_the_top_p} {

                set move_to_the_top_p 0

                #  Last minute check: if this child has no children,
                #  simply delete it from hierarchy.

                if {$subset_width == 2} {
                    spi_exec "delete from $TABLE
                        where $COLUMN = ''[quote $CHILD_ID]''
                        and l_node = $subset_l_node"
                } else {
                    #  We have $subset_l_node and $subset_r_node
                    spi_exec "update $TABLE
                      set $PARENT_COLUMN = ''[quote $CHILD_ID]''
                        where l_node = $subset_l_node
                          and r_node = $subset_r_node"
                    spi_exec "select max(r_node) as rightmost_node
                      from $TABLE"
                    set offset [expr $rightmost_node - $subset_l_node + 1]
                    #  Move subset to the top of hierarchy tree.
                    spi_exec "update $TABLE set
                        l_node = l_node + $offset,
                        r_node = r_node + $offset
                      where l_node >= $subset_l_node
                        and r_node <= $subset_r_node"
                }

            } else {

                spi_exec "delete from $TABLE
                  where l_node >= $subset_l_node
                    and r_node <= $subset_r_node"
            }

            #  Renumber and we''re all set.
            spi_exec "update $TABLE set
                l_node = l_node - $subset_width
                where l_node > $subset_l_node"
            spi_exec "update $TABLE set
                r_node = r_node - $subset_width
                where r_node > $subset_l_node"

            #  But hey!
            #  Since we''ve just deleted one entry from $TABLE
            #  we should delete its parent as well *if* the deleted
            #  entry had turned out to be its last child and *if*
            #  the parent has no parents of its own.
            spi_exec "select
                l_node as parent_l_node, r_node as parent_r_node
              from
                $TABLE
              where
                $COLUMN = $PARENT_COLUMN
                and $PARENT_COLUMN = ''[quote $PARENT_ID]''
                and r_node = l_node + 1" {
                
                spi_exec "delete from $TABLE
                  where l_node = $parent_l_node
                    and r_node = $parent_r_node"
                spi_exec "update $TABLE set
                  l_node = l_node - 2
                  where l_node > $parent_l_node"
                spi_exec "update $TABLE set
                  r_node = r_node - 2
                  where r_node > $parent_r_node"
            }
        }
    }

    return OK

' language 'pltcl';
Collapse
3: Calculating LEVEL (response to 1)
Posted by Sebastian Skracic on
Oracle's CONNECT BY internally generates LEVEL pseudocolumn which holds the nesting depth while it recursively chases parent-child links down to the leaf nodes. This is highly usable in any kind of report generation. There is another Oracle pseudocolumn, ROWNUM, available in all SELECT constructs. As its name implies, it holds ordinary number of currently fetched row. It is also worth mentioning that CONNECT BY queries have severe impact on ordering of output, since it will resemble the order in which the parent-child links have been built. Therefore most probably you won't want to change the ordering produced by CONNECT BY.

OTOH, in nested set model, we are forced to do these neat things manually. Fortunately, simply ordering by l_node will give us the same ordering as with Oracle's CONNECT BY:

  select object_type, supertype, l_node, r_node
  from acs_object_types
  order by l_node

   object_type   | supertype  | l_node | r_node
-----------------+------------+--------+--------
 acs_object      |            |      1 |     20
 relationship    | acs_object |      2 |      3
 party           | acs_object |      4 |     11
 person          | party      |      5 |      8
 user            | person     |      6 |      7
 group           | party      |      9 |     10
 membership_rel  | acs_object |     12 |     13
 composition_rel | acs_object |     14 |     15
 journal_entry   | acs_object |     16 |     17
 site_node       | acs_object |     18 |     19
(10 rows)
However, in nested set model we can't calculate LEVEL of recursion (nesting depth) directly. By "directly" I mean based on information available in current row. Instead we must resort to some scripting language (either inside or outside DBMS). It turns out not to be too difficult:
proc nested_set_compute_level {level r_node} {

  #    Maintain array holding level boundaries at caller's level
  upvar boundary boundary

  #    Now traverse down the $boundary array until we find which level
  #  this entry belongs to.  For this purpose we'll use r_node
  #  which we will successively compare to $boundary($level),
  #  $boundary($level-1) etc until we find the level the current row
  #  is bounded with.  When found, adjust the $level and its
  #  boundary.
  
  for {set i $level} {$i > 0} {incr i -1} {
      if {$r_node < $boundary($i)} {
          break
      }
  }
  set level [expr $i + 1]
  set boundary($level) $r_node

  return $level
}
Here's one simple example:
...

set db [ns_db gethandle]
set row [ns_db select $db "select object_type, supertype, l_node, r_node
      from acs_object_types order by l_node"]

set level 0
while {[ns_db getrow $db $row]} {

  set object_type [ns_set get $row object_type]
  set supertype [ns_set get $row supertype]
  set l_node [ns_set get $row l_node]
  set r_node [ns_set get $row r_node]

  set level [nested_set_compute_level $level $r_node]

  set ident_html ""
  regsub -all . [format "%*s" $level { }] {&nbsp; } ident_html

  append return_html "

    <tr>
      <td>$ident_html $object_type</td>
      <td>$supertype</td>
      <td>$level</td>
      <td>$l_node</td>
      <td>$r_node</td>
    </tr>

  "
}

ns_return ...
Previously displayed record set now appears as:
object_type supertype level l_node r_node
  acs_object 1 1 20
    relationship acs_object 2 2 3
    party acs_object 2 4 11
      person party 3 5 8
        user person 4 6 7
      group party 3 9 10
    membership_rel acs_object 2 12 13
    composition_rel acs_object 2 14 15
    journal_entry acs_object 2 16 17
    site_node acs_object 2 18 19
Collapse
Posted by Dan Wickstrom on
    Sebastian>   However, in nested set model we can't calculate LEVEL
    Sebastian> of recursion (nesting depth) directly.  By "directly" I
    Sebastian> mean based on information available in current row.
    Sebastian> Instead we must resort to some scripting language
    Sebastian> (either inside or outside DBMS).  It turns out not to
    Sebastian> be too difficult:

Actually, you can do this in sql:
create table acs_object_types (
       object_type     varchar(100) primary key,
       supertype       varchar(100) references acs_object_types,
       l_node          integer not null,
       r_node          integer not null             ...
);

insert into acs_object_types (object_type,supertype,l_node,r_node) values ('acs_object', null, 1, 20);
insert into acs_object_types (object_type,supertype,l_node,r_node) values ('relationship', 'acs_object', 2, 3);
insert into acs_object_types (object_type,supertype,l_node,r_node) values ('party', 'acs_object', 4, 11);
insert into acs_object_types (object_type,supertype,l_node,r_node) values ('person', 'party', 5, 8);
insert into acs_object_types (object_type,supertype,l_node,r_node) values ('user', 'person', 6, 7);
insert into acs_object_types (object_type,supertype,l_node,r_node) values ('group', 'party', 9, 10);
insert into acs_object_types (object_type,supertype,l_node,r_node) values ('membership_rel', 'acs_object', 12, 13);
insert into acs_object_types (object_type,supertype,l_node,r_node) values ('composition_rel', 'acs_object', 14, 15);
insert into acs_object_types (object_type,supertype,l_node,r_node) values ('journal_entry', 'acs_object', 16, 17);
insert into acs_object_types (object_type,supertype,l_node,r_node)
values ('site_node', 'acs_object', 18, 19);


    select obj1.object_type, 
           obj1.supertype, 
           obj1.l_node, 
           obj1.r_node,  
           count(obj1.supertype) + 1 as level
     from acs_object_types obj1, acs_object_types obj2
    where ((obj1.l_node > obj2.l_node and obj1.r_node < obj2.r_node) or 
           (obj1.supertype is null))
 group by obj1.object_type, 
          obj1.supertype, 
          obj1.l_node, 
          obj1.r_node
 order by obj1.l_node;


   object_type   | supertype  | l_node | r_node | level 
-----------------+------------+--------+--------+-------
 acs_object      |            |      1 |     20 |     1
 relationship    | acs_object |      2 |      3 |     2
 party           | acs_object |      4 |     11 |     2
 person          | party      |      5 |      8 |     3
 user            | person     |      6 |      7 |     4
 group           | party      |      9 |     10 |     3
 membership_rel  | acs_object |     12 |     13 |     2
 composition_rel | acs_object |     14 |     15 |     2
 journal_entry   | acs_object |     16 |     17 |     2
 site_node       | acs_object |     18 |     19 |     2
(10 rows)


Collapse
Posted by Nathan Stitt on
The set based method you outline is very good, and I have used it
before, however be aware that inserts can grow to be VERY expensive if
you have more than a few thousand rows in your table.

Miguel Sofer has a much better solution (IMHO) at:
http://www.utdt.edu/~mig/trees.tar.gz His solution is to use base 160
encoded strings in a text field.

I will illustrate with numbers so it's perhaps clearer:
So 1/2/4 in this field would mean that this post's id is 4 and it
refers to post 2, which refers in turn to post 1.

My example probably isn't very clear, but I encourage you to read his
paper to understand it further. The paper isn't complete yet, but
there is enough there to explain the general idea.  I plan on
implementing it soon for a project I'm working on and put an
explanation of the method up on my homepage at: http://www.stitt.org/

Collapse
Posted by Dan Wickstrom on
As part of the acs 4.x porting effort, I've done a trial implementation of the "m-vgID" method that Miguel Sofer outlines in his paper. This method provides the flexiblity of the nested set model, and it doesn't suffer from the same performance problems on insert/update operations. In addition, the implementation is simpler.

As with the nested set model, the tree hierarchy is encoded in a separate field, and triggers are used to maintain the hierarchy.

create table acs_object_types (
              object_type     varchar(100) primary key,
              supertype       varchar(100) references acs_object_types,
              sortkey         varchar(4000)       
);

The character Encodings are kept in a separate table which is used for doing the conversion to and from base-159 encoding.

CREATE TABLE tree_encodings (
        deci int primary key,
	code char(1) 
);

create index tree_encode_idx on tree_encodings(code);

In discussing the base-159 encoding scheme, Sofer mentions in his paper that if the database is not built with internationalization support, then it might be necessary to limit the key encodings to ASCII characters. In such a case, it would be necessary to use a base-64 encoding scheme. In my testing, I'm using a plain vanilla build of postgresql, and the base-159 encoding loads correctly, and it seems to operate well, so I'm not sure about Sofer's comment in this regard. It seems that postgresql supports the latin-1 character set by default. I would be interested in hearing comments from people with regards to potential problems that might be encountered if base-159 encoding is chosen as the default.

For the m-vgID model, there are two triggers, one for inserts and one for updates. The insert trigger function shown below, is quite simple. To calculate the newly inserted nodes sortkey value, it finds the max sortkey value among its sister nodes, increments it by one and appends it to it to the sortkey of its parent. If the newly inserted node doesn't have any sister nodes, then the sortkey value is set to '00' (see Sofer's paper for an explanation of the encoding of sortkeys in the m-vgID model).

create function acs_object_type_insert_tr () returns opaque as '
declare
        v_parent_sk     varchar;
        max_key         varchar;
begin
        select max(sortkey) into max_key 
          from acs_object_types 
         where supertype = new.supertype;

        select coalesce(max(sortkey),'''') into v_parent_sk 
          from acs_object_types 
         where object_type = new.supertype;

        new.sortkey := v_parent_sk || ''/'' || next_sortkey(max_key);

        return new;

end;' language 'plpgsql';

create trigger acs_object_type_insert_tr before insert 
on acs_object_types for each row 
execute procedure acs_object_type_insert_tr ();

The update trigger, is slightly more complicated, as it must update the whole sub-tree for the newly updated node. The first time through the loop, the clr_keys_p flag is true, and this forces all of the sub-trees sortkeys to be set to null, as a result, all of the sortkeys in the sub-tree are recomputed.

create function acs_object_type_update_tr () returns opaque as '
declare
        v_parent_sk     varchar;
        max_key         varchar;
        v_rec           record;
        clr_keys_p      boolean default ''t'';
begin
        if new.object_type = old.object_type and 
           new.supertype = old.supertype then

           return new;

        end if;

        for v_rec in select object_type
                       from acs_object_types 
                      where sortkey like new.sortkey || ''%''
                   order by sortkey
        LOOP
            if clr_keys_p then
               update acs_object_types set sortkey = null
               where sortkey like new.sortkey || ''%'';
               clr_keys_p := ''f'';
            end if;
            
            select max(sortkey) into max_key
              from acs_object_types 
              where supertype = (select supertype 
                                   from acs_object_types 
                                  where object_type = v_rec.object_type);

            select coalesce(max(sortkey),'''') into v_parent_sk 
              from acs_object_types 
             where object_type = (select supertype 
                                   from acs_object_types 
                                  where object_type = v_rec.object_type);

            update acs_object_types 
               set sortkey = v_parent_sk || ''/'' || next_sortkey(max_key)
             where object_type = v_rec.object_type;

        end LOOP;

        return new;

end;' language 'plpgsql';

create trigger acs_object_type_update_tr after update on acs_object_types
for each row execute procedure acs_object_type_update_tr ();

Each of the triggers references the "next_sortkey" routine shown below. This routine takes as its input the max sortkey value of the current nodes sister nodes and increments it by one using base 159 math. If the current node doesn't have any sister nodes at the same level, then a null is passed in and a value of "00" is returned. "00" is the encoded value of the first key for any level.

create function next_sortkey(varchar) returns varchar as '
declare
       skey     alias for $1;
       pos      integer;        
       dec_val  integer;
       stop     boolean default ''f'';
       carry    boolean default ''t'';
       nkey     varchar default '''';
       base     integer;
       ch       char(1);
begin
        base := default_tree_encoding_base();

        if skey is null then

           return ''00'';

        else
           pos := length(skey);
           LOOP               
               ch := substr(skey,pos,1);
               if carry = ''t'' then 
                   carry = ''f'';
                   select (deci + 1) % base into dec_val
                     from tree_encodings
                    where code = ch;
                   select code::varchar || nkey into nkey
                     from tree_encodings 
                    where deci = dec_val;
                   if dec_val = 0 then carry = ''t''; end if;
               else
                   nkey := ch::varchar || nkey;
               end if;
               pos := pos - 1;               
               select case when substr(skey,pos - 1,1) = ''/'' 
                               then ''t'' 
                               else ''f'' 
                           end into stop;

               exit when stop = ''t'';

           END LOOP;
           if carry = ''t'' then 
              nkey := ''0'' || nkey;
           end if;
        end if;

        select code::varchar || nkey into nkey
          from tree_encodings 
         where deci = length(nkey) - 1;

        return nkey;

end;' language 'plpgsql';

In the m-vgID model, the level corresponds to the number of '/' characters, so the pseudo level column can be calculated using the following simple pl/pgsql function:

create function tree_level(varchar) returns integer as '
declare
        inkey     alias for $1;
        cnt       integer default 0;
begin
        for i in 1..length(inkey) LOOP
            if substr(inkey,i,1) = ''/'' then
               cnt := cnt + 1;
            end if;
        end LOOP;

        return cnt;

end;' language 'plpgsql';

As with the nested set model, emulation of oracle connect-by queries can be achieved through simple queries based on the new sortkey field. Using the same data as the nested set example shown in the preceeding posts, I will show a couple of common queries with the m-vgID model.

A simple connect by statement can be emulated by selecting from acs_object_types with an "order by" on the sortkey:

select repeat(' ',(tree_level(sortkey) - 1)*2) || object_type as object_type, 
       supertype, 
       tree_level(sortkey) as level, 
       sortkey  
  from acs_object_types 
 order by sortkey;


    object_type    | supertype  | level |   sortkey
-------------------+------------+-------+--------------
 acs_object        |            |     1 | /00
   relationship    | acs_object |     2 | /00/00
   party           | acs_object |     2 | /00/01
     person        | party      |     3 | /00/01/00
       user        | person     |     4 | /00/01/00/00
     group         | party      |     3 | /00/01/01
   membership_rel  | acs_object |     2 | /00/02
   composition_rel | acs_object |     2 | /00/03
   journal_entry   | acs_object |     2 | /00/04
   site_node       | acs_object |     2 | /00/05                 
(10 rows) 

The sub-tree rooted at the 'party' object_type can be selected as follows:

select repeat(' ',(tree_level(sortkey) - 1)*2) || object_type as object_type, 
       supertype, 
       tree_level(sortkey) as level, 
       sortkey  
  from acs_object_types 
 where sortkey like (select sortkey || '%' 
                       from acs_object_types 
                      where object_type = 'party')
 order by sortkey;

 object_type | supertype  | level |   sortkey
-------------+------------+-------+--------------
   party     | acs_object |     2 | /00/01
     person  | party      |     3 | /00/01/00
       user  | person     |     4 | /00/01/00/00
     group   | party      |     3 | /00/01/01
(4 rows)                                                                       

And all of the ancestors of the 'user' object_type can be selected using the following join:

select repeat(' ',(tree_level(o2.sortkey) - 1)*2) || o2.object_type as object_type, 
       o2.supertype, 
       tree_level(o2.sortkey) as level, 
       o2.sortkey  
  from acs_object_types o1,  acs_object_types o2
 where o1.object_type = 'user'
   and o2.sortkey <= o1.sortkey
   and o1.sortkey like (o2.sortkey || '%') 
 order by sortkey;

 object_type | supertype  | level |   sortkey
-------------+------------+-------+--------------
 acs_object  |            |     1 | /00
   party     | acs_object |     2 | /00/01
     person  | party      |     3 | /00/01/00
       user  | person     |     4 | /00/01/00/00
(4 rows)                                                                       

Leaf node indication can be obtained by using a sub-select in the target list:

select repeat(' ',(tree_level(sortkey) - 1)*2) || object_type as object_type, 
       supertype, 
       tree_level(sortkey) as level, 
       sortkey,
       (select case when count(*) = 0 then 't' else 'f' end 
          from acs_object_types 
         where supertype = ot.object_type) as is_leaf_p
  from acs_object_types ot
 order by sortkey;

    object_type    | supertype  | level |   sortkey    | is_leaf_p 
-------------------+------------+-------+--------------+-----------
 acs_object        |            |     1 | /00          | f
   relationship    | acs_object |     2 | /00/00       | t
   party           | acs_object |     2 | /00/01       | f
     person        | party      |     3 | /00/01/00    | f
       user        | person     |     4 | /00/01/00/00 | t
     group         | party      |     3 | /00/01/01    | t
   membership_rel  | acs_object |     2 | /00/02       | t
   composition_rel | acs_object |     2 | /00/03       | t
   journal_entry   | acs_object |     2 | /00/04       | t
   site_node       | acs_object |     2 | /00/05       | t
(10 rows)
Collapse
Posted by Gilbert Wong on
Dan,

Is this the official CONNECT BY solution for OpenACS 4?  Thanks.

Collapse
Posted by Dan Wickstrom on
Yes, pretty much so.  The queries used here are represenative of what you would use in porting for openacs 4.0, but the triggers have evolved a little since I posted this.  Look at the triggers on acs_objects or acs_object_types for better examples.
Collapse
Posted by Claudio Pasolini on
Dan,
I'm very interested to your proposed solution, because I have to maintain hierarchies in a quite big table. I'm trying to apply your posting but I get an error because the function default_tree_encoding_base() is not defined. I also don't understand how I should populate the tree_encoding table.

TIA for any advice.

Claudio Pasolini

Collapse
Posted by David Walker on
Has any consideration been given to postgres inheritance?
I included a rough layout of the objects above using postgres inheritance.

a "select * from acs_user;" yields
 acs_object_id | acs_party_id | person_id | user_id
---------------+--------------+-----------+---------
             1 |            1 |         1 |       1


-- TOC Entry ID 5 (OID 30230)
--
-- Name: acs_object Type: TABLE Owner: root
--

CREATE TABLE "acs_object" (
	"acs_object_id" integer
);

--
-- TOC Entry ID 6 (OID 30240)
--
-- Name: acs_relationship Type: TABLE Owner: root
--

CREATE TABLE "acs_relationship" (
	"acs_relationship_id" integer
)
INHERITS ("acs_object");

--
-- TOC Entry ID 7 (OID 30253)
--
-- Name: acs_party Type: TABLE Owner: root
--

CREATE TABLE "acs_party" (
	"acs_party_id" integer
)
INHERITS ("acs_object");

--
-- TOC Entry ID 8 (OID 30266)
--
-- Name: acs_person Type: TABLE Owner: root
--

CREATE TABLE "acs_person" (
	"person_id" integer
)
INHERITS ("acs_party");

--
-- TOC Entry ID 9 (OID 30281)
--
-- Name: acs_user Type: TABLE Owner: root
--

CREATE TABLE "acs_user" (
	"user_id" integer
)
INHERITS ("acs_person");

--
-- TOC Entry ID 10 (OID 30298)
--
-- Name: acs_group Type: TABLE Owner: root
--

CREATE TABLE "acs_group" (
	"acs_group_id" integer
)
INHERITS ("acs_party");

--
-- TOC Entry ID 11 (OID 30313)
--
-- Name: acs_membership_rel Type: TABLE Owner: root
--

CREATE TABLE "acs_membership_rel" (
	"acs_membership_rel_id" integer
)
INHERITS ("acs_object");

--
-- TOC Entry ID 12 (OID 30326)
--
-- Name: acs_composition_rel Type: TABLE Owner: root
--

CREATE TABLE "acs_composition_rel" (
	"acs_composition_rel_id" integer
)
INHERITS ("acs_object");

--
-- TOC Entry ID 13 (OID 30339)
--
-- Name: acs_journal_entry Type: TABLE Owner: root
--

CREATE TABLE "acs_journal_entry" (
	"acs_journal_entry_id" integer
)
INHERITS ("acs_object");

--
-- TOC Entry ID 14 (OID 30352)
--
-- Name: acs_site_node Type: TABLE Owner: root
--

CREATE TABLE "acs_site_node" (
	"acs_site_node" integer
)
INHERITS ("acs_object");

Collapse
Posted by David Walker on
That is a postgres specific (not SQL92) extension of course so maybe it's not worth it if we want to be compatible with multiple databases.
Collapse
Posted by Dan Wickstrom on
We're using this extensively in openacs4.  If you download the latest code, you can find the tree encoding table and some tree-utility routines in openacs-4/packages/acs-kernel/sql/postgresql/postgresql.sql.  For corresponding trigger routines look at openacs-4/packages/acs-kernel/sql/postgresql/acs-objects-create.sql - specifically look at the acs_objects table definition and its associated triggers.

With regards to inheritance, we looked seriously at prior to starting the opeancs4 porting activities and we opted not to use it because it was deficient in serveral areas.

Collapse
13: Postgres Inheritance (response to 1)
Posted by David Walker on
I'm interested in which areas were deficient as I may wish to use this in an upcoming project of mine.
Collapse
Posted by Dan Wickstrom on
The main deficiencies that I recall are related to RI triggers and indices not being passed on to the inherited table.  You can work-around this by explicitly adding the triggers and indices after you subclass a table, but we thought that it would be very error-prone - especially on a large project like openacs4.  In general we thought that it would to a large number of performance and data consistency problems that would be hard to track down.

In addition to the above-named problems, the query syntax for selecting from inherited tables is not sql3 compliant and most likely it will change in the future.  Currently (or at least when we looked at it) if you select * from acs_objects, you will only get rows that correspond to items that are not part of inherited tables.  In order to actually get all of the rows (including the rows in inherited tables), you need to do: select * from acs_objects*.  This seems very error prone - especially since it leads to silent failures.  The query returns rows, just not all of the rows that you need.

This is all I can recall right now, but I wouldn't be surprised if I'm forgetting some other issues that we discussed when first looked into using inheritance.

Overall, inheritance seemed to have alot of potential - escpecially in regards to openacs4's object model, but the postgresql implementation is just not up to the task yet.

Collapse
Posted by David Walker on
Thanks Dan,

From my testing today, triggers and, I assume, indices, still do not get passed on.

It looks to me like the 2nd issue has been cleared up.  I'm looking at a Postgres 7.1.3 database and an "insert into acs_user" followed by "select * from acs_object" shows the row I just inserted.

Collapse
Posted by Don Baccus on
At the time we made the decision they were undecided as to whether or not to adopt the SQL3 syntax.

To expand on Dan's list a bit, there's no way to enforce a unique key across a class and its subclasses, for instance the primary key, because each class and subclass has its own separate index (which you must remember to create by hand).  This is very bad.

Also, RI doesn't work with an inheritance tree at this time - the triggers will enforce RI integrity on the named class (i.e. relation/table) rather than the class and its subclasses.

As Dan said - lots of potential, but it's not really feature complete at this time.

Collapse
Posted by Claudio Pasolini on
Thank you Dan,
I downloaded the OpenACS 4 code and now all the triggers and functions work fine.
Collapse
Posted by Don Baccus on
This strategy using LIKE doesn't scale well because the concat ("tree_sortkey || '%'") causes Postgres to not use the index (it can't because it has no way of knowing that tree_sortkey won't contain a non-trailing '%' or '_' character).

We've switched to using "between", along with a helper function.  Check out the sources for many examples - grep for "tree_sortkey between".