Forum OpenACS Development: CONNECT BY solution
create table acs_object_types ( object_type varchar(100) not null constraint acs_object_types_pk primary key, supertype varchar(100) constraint acs_object_types_supertype_fk references acs_object_types (object_type), ... );Suppose we have hiearchy defined with these entries:
object_type | supertype -----------------+------------ acs_object | relationship | acs_object party | acs_object person | party user | person group | party membership_rel | acs_object composition_rel | acs_object journal_entry | acs_object site_node | acs_objectOr, presented as hierarchy tree:
- acs_object
- relationship
- party
- person
- user
- group
- person
- membership_rel
- composition_rel
- journal_entry
- site_node
select object_type from acs_object_types start with object_type = 'party' connect by object_type = prior supertype;Shortly, we denote root of subtree we want to take out of hierarchy with START WITH clause, and criterion which maps child row to its parent with CONNECT BY clause.
Let's build the simple view which will hold all type-supertype mappings, regardless of number of generations involved:
create or replace view acs_object_type_supertype_map as select ot.object_type, ota.object_type as ancestor_type from acs_object_types ot, acs_object_types ota where ota.object_type in (select object_type from acs_object_types start with object_type = ot.supertype connect by object_type = prior supertype); select object_type, ancestor_type from acs_object_type_supertype_map order by object_type, ancestor_type; object_type | ancestor_type -----------------+----------------- acs_object | acs_object composition_rel | acs_object composition_rel | composition_rel group | acs_object group | group group | party journal_entry | acs_object journal_entry | journal_entry membership_rel | acs_object membership_rel | membership_rel party | acs_object party | party person | acs_object person | party person | person relationship | acs_object relationship | relationship site_node | acs_object site_node | site_node user | acs_object user | party user | person user | user (23 rows)Detailed explanation of handling trees in Oracle is available at Trees in Oracle SQL chapter in SQL for Web Nerds.
Nested set model
In contrast to parent-child model of representing trees, nested set model (as described in Celko's SQL for smarties), represents hierarchy by assigning two numbers (left and right node) to each element. Parent elements simply nests its children which can further have its own children etc:create table acs_object_types ( object_type varchar(100) not null constraint acs_object_types_pk primary key, l_node integer not null, r_node integer not null, ... );Previously shown hierarchy can be represented with following entries in nested set model:
object_type | l_node | r_node -----------------+--------+-------- acs_object | 1 | 20 relationship | 2 | 3 party | 4 | 11 person | 5 | 8 user | 6 | 7 group | 9 | 10 membership_rel | 12 | 13 composition_rel | 14 | 15 journal_entry | 16 | 17 site_node | 18 | 19Or illustrated:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | | | | | | | | | | | | | | | | | | | | | relnshp | | +user+ | +group+ | membrel comp_r jrn_ent sit_nd | | | +-person-+ | | | +--------party--------+ | +---------------------------acs_object---------------------------+Searching for all subtypes of 'party' object_type can be performed by:
select object_type from acs_object_types where l_node > 4 and r_node < 11; object_type ------------- person user groupSimilarly, one can search for all supertypes of 'user' object_type:
select object_type from acs_object_types where l_node < 6 and r_node > 7; object_type ------------- person party acs_objectAgain, building parent-child mappings spanning multiple generations is nothing more difficult than with Oracle-specific CONNECT BY:
create view acs_object_type_supertype_map as select ot.object_type, ota.object_type as ancestor_type from acs_object_types ot, acs_object_types ota where ota.object_type in (select object_type from acs_object_types where l_node <= ot.l_node and r_node >= ot.r_node );
Triggers
Fields l_node and r_node are being maintained by triggers firing on all write operations to acs_object_types table.These triggers are written in PostgreSQL's pltcl procedural language, so that they can be reused for any table involving parent-child relationships. They can also be adapted for other databases, except that Oracle will barf on you with famous "table is mutating" error because you want to manipulate rows of the same table the trigger is acting on.
create trigger acs_obj_types_nested_set_ins_tr before insert on acs_object_types for each row execute procedure nested_set_insert_node('acs_object_types', 'object_type', 'supertype'); create trigger acs_obj_types_nested_set_upd_tr after update on acs_object_types for each row execute procedure nested_set_update_node('acs_object_types', 'object_type', 'supertype') ; create trigger acs_obj_types_nested_set_del_tr before delete on acs_object_types for each row execute procedure nested_set_delete_node('acs_object_types', 'object_type', 'supertype') ;Each trigger expects three arguments: table name and child and parent field column names.
- INSERT trigger
create function nested_set_insert_node () returns opaque as ' set TABLE $1 set COLUMN $2 set PARENT_COLUMN $3 if {![info exists NEW($PARENT_COLUMN)]} { # New node is added at the end of the nested set list. spi_exec "select coalesce(max(r_node),0) as rightmost_node from $TABLE" set NEW(l_node) [expr $rightmost_node + 1] set NEW(r_node) [expr $rightmost_node + 2] } else { spi_exec "select r_node as parent_r_node from $TABLE where $COLUMN = ''[quote $NEW($PARENT_COLUMN)]''" spi_exec "update $TABLE set l_node = l_node + 2 where l_node > $parent_r_node" spi_exec "update $TABLE set r_node = r_node + 2 where r_node >= $parent_r_node" set NEW(l_node) $parent_r_node set NEW(r_node) [expr $parent_r_node + 1] } return [array get NEW] ' language 'pltcl';
- UPDATE trigger
create function nested_set_update_node () returns opaque as ' set TABLE $1 set COLUMN $2 set PARENT_COLUMN $3 # We cannot use new.l_node and new.r_node since they can # be superseded by bunch of UPDATEs in this very trigger # (this happens if the trigger fires on more than 1 row). # So we retrieve the actual value of l_node and r_node with this: spi_exec "select l_node as curr_l_node, r_node as curr_r_node from $TABLE where $COLUMN = ''[quote $NEW($COLUMN)]''" # We have to rearrange nodes only if supertype has been changed. # There are two main code branches: # 1. supertype changes from NOT NULL to NULL # 2. supertype changes from something to NOT NULL if {![info exists OLD($PARENT_COLUMN)] && ![info exists NEW($PARENT_COLUMN)]} { return "OK" } if {[info exists OLD($PARENT_COLUMN)] && [info exists NEW($PARENT_COLUMN)] && ![string compare $OLD($PARENT_COLUMN) $NEW($PARENT_COLUMN)]} { return "OK" } # In any case we must shift entire subset out of its parent # and put it to the right of the rightmost node, to prevent # it from being overwritten by node adjustments. spi_exec "select max(r_node) as rightmost_node from $TABLE" set nested_set_width [expr $curr_r_node - $curr_l_node + 1] set shift_offset [expr $rightmost_node - $curr_l_node + 1] # Shift nested subset out of its parent. spi_exec "update $TABLE set l_node = l_node + $shift_offset, r_node = r_node + $shift_offset where l_node >= $curr_l_node and r_node <= $curr_r_node" # It''s good to know that the [$curr_l_node,$curr_r_node] interval # is now not occupied by nodes. # # Case 1) # if {[info exists OLD($PARENT_COLUMN)] && ![info exists NEW($PARENT_COLUMN)]} { # Since we have already lifted our subset out of its # position, we will simply renumber the nodes to fill the gaps. spi_exec "update $TABLE set l_node = l_node - $nested_set_width where l_node > $curr_r_node" spi_exec "update $TABLE set r_node = r_node - $nested_set_width where r_node > $curr_r_node" return "OK" } # # Case 2) # if {[info exists NEW($PARENT_COLUMN)]} { # The tricky part here is that we must pay attention whether # the gap is to the left or to the right of inserting point. spi_exec "select l_node as parent_l_node, r_node as parent_r_node from $TABLE where $COLUMN = ''[quote $NEW($PARENT_COLUMN)]''" # Where will this subset be moved? if {$parent_r_node < $curr_l_node} { # Gap is to the right of inserting point spi_exec "update $TABLE set l_node = l_node + $nested_set_width where l_node > $parent_r_node and l_node < $curr_l_node" spi_exec "update $TABLE set r_node = r_node + $nested_set_width where r_node >= $parent_r_node and r_node < $curr_l_node" # Place the subset under its new parent set shift_offset [expr $rightmost_node - $parent_r_node + 1] } else { # The gap is to the LEFT of inserting point spi_exec "update $TABLE set l_node = l_node - $nested_set_width where l_node <= $parent_l_node and l_node > $curr_r_node" spi_exec "update $TABLE set r_node = r_node - $nested_set_width where r_node < $parent_l_node and r_node > $curr_r_node" set shift_offset [expr $rightmost_node + $nested_set_width - $parent_l_node] } spi_exec "update $TABLE set r_node = r_node - $shift_offset, l_node = l_node - $shift_offset where l_node > $rightmost_node" return "OK" } return "OK" ' language 'pltcl';
- DELETE trigger
create function nested_set_delete_node () returns opaque as ' set TABLE $1 set COLUMN $2 set PARENT_COLUMN $3 # We must prevent deletion of non-leaf nodes. if {[expr $OLD(r_node) - $OLD(l_node)] != 1} { elog ERROR "Type ''$OLD($COLUMN)'' cannot be deleted, l_node = $OLD(l_node) r_node = $OLD(r_node)" return "SKIP" } spi_exec "update $TABLE set l_node = l_node - 2 where l_node > $OLD(r_node)" spi_exec "update $TABLE set r_node = r_node - 2 where r_node > $OLD(r_node)" return "OK" ' language 'pltcl';
create table acs_privileges ( privilege varchar(100) not null constraint acs_privileges_pk primary key, pretty_name varchar(100), pretty_plural varchar(100) ); create table acs_privilege_hierarchy ( privilege not null constraint acs_priv_hier_priv_fk references acs_privileges (privilege), child_privilege not null constraint acs_priv_hier_child_priv_fk references acs_privileges (privilege), constraint acs_privilege_hierarchy_pk primary key (privilege, child_privilege) ); create or replace view acs_privilege_descendant_map as select p1.privilege, p2.privilege as descendant from acs_privileges p1, acs_privileges p2 where p2.privilege in (select child_privilege from acs_privilege_hierarchy start with privilege = p1.privilege connect by prior child_privilege = privilege) or p2.privilege = p1.privilege;Gotcha! Nested set model (as described above) won't work in this case. Nested sets represent flattened hierarchy tree where every item is supposed to appear exactly once.
Back to the privilege hierarchy: each privilege can have arbitrary number of child privileges, but some privileges can be child of (i.e. implied by) more than one parent privileges. For example, 'admin' implies 'read', 'write', 'delete', ... but it makes sense that 'moderate' implies 'read' as well:
select privilege, child_privilege from acs_privilege_hierarchy order by privilege, child_privilege; privilege | child_privilege -----------+----------------- admin | create admin | delete admin | read admin | write all | admin all | moderate moderate | read moderate | write read | browse (9 rows)The hierarchy tree (actually this is not a tree any more) generated with Oracle's CONNECT BY will look like:
- all
- admin
- create
- delete
- read
- browse
- write
- moderate
- read
- browse
- write
- read
- admin
So, simple nested set model won't help us here because an element can appear multiple times in hierarchy.
Extended nested set model
After fiddling with this a little more, I've come up with the solution based on slightly extended nested set data model. In addition to acs_privilege and acs_privilege_hierarchy, I've created additional table to store sets themselves:create table acs_privilege_hierarchy_index ( privilege varchar(100) not null constraint acs_priv_hier_ndx_priv_fk references acs_privileges (privilege), child_privilege varchar(100) not null constraint acs_priv_hier_ndx_child_priv_fk references acs_privileges (privilege), l_node integer not null, r_node integer not null );This table is used to translate quasi-hierarchy tree shown above into the nested set model. Why couldn't we simply put l_node and r_node in acs_privilege_hierarchy? Because a single entry in acs_privilege_hierarchy could generate many elements in quasi-hierarchy. In other words, (privilege, child_privilege) pair cannot uniquely describe position of particular element in quasi-hierarchical tree listing, so there is no point of attaching (l_node, r_node) attribute to it.
As in the "normal" nested set model, acs_privilege_hierarchy_index is being maintained with triggers for INSERT and DELETE on acs_privilege_hierarchy. acs_privilege_descendant_map can be defined through:
create view acs_privilege_descendant_map as select p1.privilege, p2.privilege as descendant from acs_privileges p1, acs_privileges p2 where p2.privilege in (select h2.child_privilege from acs_privilege_hierarchy_index h1, acs_privilege_hierarchy_index h2 where h1.child_privilege = p1.privilege and h1.l_node <= h2.l_node and h1.r_node >= h2.r_node) ; select privilege, descendant from acs_privilege_descendant_map order by privilege, descendant; privilege | descendant -----------+------------ admin | admin admin | browse admin | create admin | delete admin | read admin | write all | admin all | all all | browse all | create all | delete all | moderate all | read all | write browse | browse create | create delete | delete moderate | browse moderate | moderate moderate | read moderate | write read | browse read | read write | write (24 rows)
Triggers
I'm not going to enter into more details right now, I'll simply post the trigger code:create trigger acs_privil_hier_nested_set_in_tr before insert on acs_privilege_hierarchy for each row execute procedure multiple_nested_set_insert_node('acs_privilege_hierarchy_index', 'child_privilege', 'privilege'); create trigger acs_privil_hier_nested_set_del_tr after delete on acs_privilege_hierarchy for each row execute procedure multiple_nested_set_delete_node('acs_privilege_hierarchy_index', 'child_privilege', 'privilege') ; create function multiple_nested_set_insert_node () returns opaque as ' set TABLE $1 set COLUMN $2 set PARENT_COLUMN $3 set CHILD_ID $NEW($COLUMN) set PARENT_ID $NEW($PARENT_COLUMN) set subset_width 2 set garbage_collect_p 0 # Do we have parent already? spi_exec "select count(*) as parent_p from $TABLE where $COLUMN = ''[quote $PARENT_ID]''" if {!$parent_p} { spi_exec "select coalesce(max(r_node),0) as rightmost_node from $TABLE" spi_exec "insert into $TABLE ($COLUMN, $PARENT_COLUMN, l_node, r_node) values (''[quote $PARENT_ID]'', ''[quote $PARENT_ID]'', $rightmost_node + 1, $rightmost_node + 2)" } # Does the children stand on its own in hierarchy? i.e. is this # its first designated parent? spi_exec "select l_node as child_l_node, r_node as child_r_node from $TABLE where $COLUMN = $PARENT_COLUMN and $COLUMN = ''[quote $CHILD_ID]''" if {[info exists child_l_node]} { # We should also garbage collect the space this subset occupied. set garbage_collect_p 1 set subset_width [expr $child_r_node - $child_l_node + 1] } else { # However, our child could still have had its own children, # regardless of standing on top of hierarchy tree. # spi_exec "select l_node as child_l_node, r_node as child_r_node from $TABLE where $COLUMN = ''[quote $CHILD_ID]'' and $COLUMN <> $PARENT_COLUMN " { # Child patterns should be all the same, so exit on first found set subset_width [expr $child_r_node - $child_l_node + 1] break } } if {![info exists child_l_node]} { } # Inserting new CHILD node. We need to lookup its parent and # then insert in appropriate position. # Attention! There might be multiple parent entries! # Perform this stuff for each parent entry found. set i 0 set adjustment 0 spi_exec "select r_node as parent_r_node from $TABLE where $COLUMN = ''[quote $PARENT_ID]'' order by r_node" { incr parent_r_node $adjustment incr i if {$garbage_collect_p} { set garbage_collect_p 0 spi_exec "update $TABLE set $PARENT_COLUMN = ''[quote $PARENT_ID]'' where $COLUMN = $PARENT_COLUMN and $COLUMN = ''[quote $CHILD_ID]''" spi_exec "update $TABLE set l_node = l_node + $subset_width where l_node > $parent_r_node" spi_exec "update $TABLE set r_node = r_node + $subset_width where r_node >= $parent_r_node" set offset [expr $parent_r_node - $child_l_node] spi_exec "update $TABLE set l_node = l_node + $offset where l_node >= $child_l_node and l_node < $child_r_node" spi_exec "update $TABLE set r_node = r_node + $offset where r_node > $child_l_node and r_node <= $child_r_node" # We now must renumber nodes to fill in the gaps. # Gap is bounded by (child_l_node, child_r_node). spi_exec "update $TABLE set l_node = l_node - $subset_width where l_node >= $child_l_node" spi_exec "update $TABLE set r_node = r_node - $subset_width where r_node > $child_l_node" } else { # Allocate $subset_width nodes spi_exec "update $TABLE set l_node = l_node + $subset_width where l_node > $parent_r_node" spi_exec "update $TABLE set r_node = r_node + $subset_width where r_node >= $parent_r_node" spi_exec "insert into $TABLE ($COLUMN, $PARENT_COLUMN, l_node, r_node) values (''[quote $CHILD_ID]'', ''[quote $PARENT_ID]'', $parent_r_node, $parent_r_node + $subset_width - 1)" incr adjustment $subset_width # Now copy grandchildren (if exist) if {$subset_width > 2} { # Careful not to locate the row just inserted! spi_exec "select l_node as child_l_node, r_node as child_r_node from $TABLE where $COLUMN = ''[quote $CHILD_ID]'' and l_node <> $parent_r_node" { break } set offset [expr $parent_r_node - $child_l_node] spi_exec "insert into $TABLE ($COLUMN, $PARENT_COLUMN, l_node, r_node) select $COLUMN, $PARENT_COLUMN, l_node + $offset, r_node + $offset from $TABLE where l_node > $child_l_node and r_node < $child_r_node" } } } return OK ' language 'pltcl'; create function multiple_nested_set_delete_node () returns opaque as ' set TABLE $1 set COLUMN $2 set PARENT_COLUMN $3 set CHILD_ID $OLD($COLUMN) set PARENT_ID $OLD($PARENT_COLUMN) # Find all nested subsets carrying ($CHILD_ID, $PARENT_ID) mappings set more_rows_p 1 set move_to_the_top_p 1 while {$more_rows_p} { set more_rows_p 0 spi_exec "select l_node as subset_l_node, r_node as subset_r_node from $TABLE where $COLUMN = ''[quote $CHILD_ID]'' and $PARENT_COLUMN = ''[quote $PARENT_ID]''" { set more_rows_p 1 set subset_width [expr $subset_r_node - $subset_l_node + 1] break } if {$more_rows_p} { if {$move_to_the_top_p} { # # Why this check? # # Because particular subset can appear several times in # nested set hierarchy, all of these subsets being # identical. We only need to preserve one of these, # say first one found. All others will be garbage # collected. # # There is an exception: # # Check whether $CHILD_ID has any other parent left. # If yes, we simply delete the first rec found as all others # (since remaining parentage line(s) will take care of # child''s children). If not, move to the top of hierarchy # tree, but update the $PARENT_COLUMN field in index entry to # reflect these (i.e. point to child itself). spi_exec "select count(*) as other_parents_p from $TABLE where $COLUMN = ''[quote $CHILD_ID]'' and $PARENT_COLUMN <> ''[quote $PARENT_ID]''" if {$other_parents_p} { set move_to_the_top_p 0 } } if {$move_to_the_top_p} { set move_to_the_top_p 0 # Last minute check: if this child has no children, # simply delete it from hierarchy. if {$subset_width == 2} { spi_exec "delete from $TABLE where $COLUMN = ''[quote $CHILD_ID]'' and l_node = $subset_l_node" } else { # We have $subset_l_node and $subset_r_node spi_exec "update $TABLE set $PARENT_COLUMN = ''[quote $CHILD_ID]'' where l_node = $subset_l_node and r_node = $subset_r_node" spi_exec "select max(r_node) as rightmost_node from $TABLE" set offset [expr $rightmost_node - $subset_l_node + 1] # Move subset to the top of hierarchy tree. spi_exec "update $TABLE set l_node = l_node + $offset, r_node = r_node + $offset where l_node >= $subset_l_node and r_node <= $subset_r_node" } } else { spi_exec "delete from $TABLE where l_node >= $subset_l_node and r_node <= $subset_r_node" } # Renumber and we''re all set. spi_exec "update $TABLE set l_node = l_node - $subset_width where l_node > $subset_l_node" spi_exec "update $TABLE set r_node = r_node - $subset_width where r_node > $subset_l_node" # But hey! # Since we''ve just deleted one entry from $TABLE # we should delete its parent as well *if* the deleted # entry had turned out to be its last child and *if* # the parent has no parents of its own. spi_exec "select l_node as parent_l_node, r_node as parent_r_node from $TABLE where $COLUMN = $PARENT_COLUMN and $PARENT_COLUMN = ''[quote $PARENT_ID]'' and r_node = l_node + 1" { spi_exec "delete from $TABLE where l_node = $parent_l_node and r_node = $parent_r_node" spi_exec "update $TABLE set l_node = l_node - 2 where l_node > $parent_l_node" spi_exec "update $TABLE set r_node = r_node - 2 where r_node > $parent_r_node" } } } return OK ' language 'pltcl';
OTOH, in nested set model, we are forced to do these neat things manually. Fortunately, simply ordering by l_node will give us the same ordering as with Oracle's CONNECT BY:
select object_type, supertype, l_node, r_node from acs_object_types order by l_node object_type | supertype | l_node | r_node -----------------+------------+--------+-------- acs_object | | 1 | 20 relationship | acs_object | 2 | 3 party | acs_object | 4 | 11 person | party | 5 | 8 user | person | 6 | 7 group | party | 9 | 10 membership_rel | acs_object | 12 | 13 composition_rel | acs_object | 14 | 15 journal_entry | acs_object | 16 | 17 site_node | acs_object | 18 | 19 (10 rows)However, in nested set model we can't calculate LEVEL of recursion (nesting depth) directly. By "directly" I mean based on information available in current row. Instead we must resort to some scripting language (either inside or outside DBMS). It turns out not to be too difficult:
proc nested_set_compute_level {level r_node} { # Maintain array holding level boundaries at caller's level upvar boundary boundary # Now traverse down the $boundary array until we find which level # this entry belongs to. For this purpose we'll use r_node # which we will successively compare to $boundary($level), # $boundary($level-1) etc until we find the level the current row # is bounded with. When found, adjust the $level and its # boundary. for {set i $level} {$i > 0} {incr i -1} { if {$r_node < $boundary($i)} { break } } set level [expr $i + 1] set boundary($level) $r_node return $level }Here's one simple example:
... set db [ns_db gethandle] set row [ns_db select $db "select object_type, supertype, l_node, r_node from acs_object_types order by l_node"] set level 0 while {[ns_db getrow $db $row]} { set object_type [ns_set get $row object_type] set supertype [ns_set get $row supertype] set l_node [ns_set get $row l_node] set r_node [ns_set get $row r_node] set level [nested_set_compute_level $level $r_node] set ident_html "" regsub -all . [format "%*s" $level { }] { } ident_html append return_html " <tr> <td>$ident_html $object_type</td> <td>$supertype</td> <td>$level</td> <td>$l_node</td> <td>$r_node</td> </tr> " } ns_return ...Previously displayed record set now appears as:
object_type supertype level l_node r_node acs_object 1 1 20 relationship acs_object 2 2 3 party acs_object 2 4 11 person party 3 5 8 user person 4 6 7 group party 3 9 10 membership_rel acs_object 2 12 13 composition_rel acs_object 2 14 15 journal_entry acs_object 2 16 17 site_node acs_object 2 18 19
Sebastian> However, in nested set model we can't calculate LEVEL Sebastian> of recursion (nesting depth) directly. By "directly" I Sebastian> mean based on information available in current row. Sebastian> Instead we must resort to some scripting language Sebastian> (either inside or outside DBMS). It turns out not to Sebastian> be too difficult:Actually, you can do this in sql:
create table acs_object_types ( object_type varchar(100) primary key, supertype varchar(100) references acs_object_types, l_node integer not null, r_node integer not null ... ); insert into acs_object_types (object_type,supertype,l_node,r_node) values ('acs_object', null, 1, 20); insert into acs_object_types (object_type,supertype,l_node,r_node) values ('relationship', 'acs_object', 2, 3); insert into acs_object_types (object_type,supertype,l_node,r_node) values ('party', 'acs_object', 4, 11); insert into acs_object_types (object_type,supertype,l_node,r_node) values ('person', 'party', 5, 8); insert into acs_object_types (object_type,supertype,l_node,r_node) values ('user', 'person', 6, 7); insert into acs_object_types (object_type,supertype,l_node,r_node) values ('group', 'party', 9, 10); insert into acs_object_types (object_type,supertype,l_node,r_node) values ('membership_rel', 'acs_object', 12, 13); insert into acs_object_types (object_type,supertype,l_node,r_node) values ('composition_rel', 'acs_object', 14, 15); insert into acs_object_types (object_type,supertype,l_node,r_node) values ('journal_entry', 'acs_object', 16, 17); insert into acs_object_types (object_type,supertype,l_node,r_node) values ('site_node', 'acs_object', 18, 19); select obj1.object_type, obj1.supertype, obj1.l_node, obj1.r_node, count(obj1.supertype) + 1 as level from acs_object_types obj1, acs_object_types obj2 where ((obj1.l_node > obj2.l_node and obj1.r_node < obj2.r_node) or (obj1.supertype is null)) group by obj1.object_type, obj1.supertype, obj1.l_node, obj1.r_node order by obj1.l_node; object_type | supertype | l_node | r_node | level -----------------+------------+--------+--------+------- acs_object | | 1 | 20 | 1 relationship | acs_object | 2 | 3 | 2 party | acs_object | 4 | 11 | 2 person | party | 5 | 8 | 3 user | person | 6 | 7 | 4 group | party | 9 | 10 | 3 membership_rel | acs_object | 12 | 13 | 2 composition_rel | acs_object | 14 | 15 | 2 journal_entry | acs_object | 16 | 17 | 2 site_node | acs_object | 18 | 19 | 2 (10 rows)
before, however be aware that inserts can grow to be VERY expensive if
you have more than a few thousand rows in your table.
Miguel Sofer has a much better solution (IMHO) at:
http://www.utdt.edu/~mig/trees.tar.gz His solution is to use base 160
encoded strings in a text field.
I will illustrate with numbers so it's perhaps clearer:
So 1/2/4 in this field would mean that this post's id is 4 and it
refers to post 2, which refers in turn to post 1.
My example probably isn't very clear, but I encourage you to read his
paper to understand it further. The paper isn't complete yet, but
there is enough there to explain the general idea. I plan on
implementing it soon for a project I'm working on and put an
explanation of the method up on my homepage at: http://www.stitt.org/
As with the nested set model, the tree hierarchy is encoded in a separate field, and triggers are used to maintain the hierarchy.
create table acs_object_types ( object_type varchar(100) primary key, supertype varchar(100) references acs_object_types, sortkey varchar(4000) );
The character Encodings are kept in a separate table which is used for doing the conversion to and from base-159 encoding.
CREATE TABLE tree_encodings ( deci int primary key, code char(1) ); create index tree_encode_idx on tree_encodings(code);
In discussing the base-159 encoding scheme, Sofer mentions in his paper that if the database is not built with internationalization support, then it might be necessary to limit the key encodings to ASCII characters. In such a case, it would be necessary to use a base-64 encoding scheme. In my testing, I'm using a plain vanilla build of postgresql, and the base-159 encoding loads correctly, and it seems to operate well, so I'm not sure about Sofer's comment in this regard. It seems that postgresql supports the latin-1 character set by default. I would be interested in hearing comments from people with regards to potential problems that might be encountered if base-159 encoding is chosen as the default.
For the m-vgID model, there are two triggers, one for inserts and one for updates. The insert trigger function shown below, is quite simple. To calculate the newly inserted nodes sortkey value, it finds the max sortkey value among its sister nodes, increments it by one and appends it to it to the sortkey of its parent. If the newly inserted node doesn't have any sister nodes, then the sortkey value is set to '00' (see Sofer's paper for an explanation of the encoding of sortkeys in the m-vgID model).
create function acs_object_type_insert_tr () returns opaque as ' declare v_parent_sk varchar; max_key varchar; begin select max(sortkey) into max_key from acs_object_types where supertype = new.supertype; select coalesce(max(sortkey),'''') into v_parent_sk from acs_object_types where object_type = new.supertype; new.sortkey := v_parent_sk || ''/'' || next_sortkey(max_key); return new; end;' language 'plpgsql'; create trigger acs_object_type_insert_tr before insert on acs_object_types for each row execute procedure acs_object_type_insert_tr ();
The update trigger, is slightly more complicated, as it must update the whole sub-tree for the newly updated node. The first time through the loop, the clr_keys_p flag is true, and this forces all of the sub-trees sortkeys to be set to null, as a result, all of the sortkeys in the sub-tree are recomputed.
create function acs_object_type_update_tr () returns opaque as ' declare v_parent_sk varchar; max_key varchar; v_rec record; clr_keys_p boolean default ''t''; begin if new.object_type = old.object_type and new.supertype = old.supertype then return new; end if; for v_rec in select object_type from acs_object_types where sortkey like new.sortkey || ''%'' order by sortkey LOOP if clr_keys_p then update acs_object_types set sortkey = null where sortkey like new.sortkey || ''%''; clr_keys_p := ''f''; end if; select max(sortkey) into max_key from acs_object_types where supertype = (select supertype from acs_object_types where object_type = v_rec.object_type); select coalesce(max(sortkey),'''') into v_parent_sk from acs_object_types where object_type = (select supertype from acs_object_types where object_type = v_rec.object_type); update acs_object_types set sortkey = v_parent_sk || ''/'' || next_sortkey(max_key) where object_type = v_rec.object_type; end LOOP; return new; end;' language 'plpgsql'; create trigger acs_object_type_update_tr after update on acs_object_types for each row execute procedure acs_object_type_update_tr ();
Each of the triggers references the "next_sortkey" routine shown below. This routine takes as its input the max sortkey value of the current nodes sister nodes and increments it by one using base 159 math. If the current node doesn't have any sister nodes at the same level, then a null is passed in and a value of "00" is returned. "00" is the encoded value of the first key for any level.
create function next_sortkey(varchar) returns varchar as ' declare skey alias for $1; pos integer; dec_val integer; stop boolean default ''f''; carry boolean default ''t''; nkey varchar default ''''; base integer; ch char(1); begin base := default_tree_encoding_base(); if skey is null then return ''00''; else pos := length(skey); LOOP ch := substr(skey,pos,1); if carry = ''t'' then carry = ''f''; select (deci + 1) % base into dec_val from tree_encodings where code = ch; select code::varchar || nkey into nkey from tree_encodings where deci = dec_val; if dec_val = 0 then carry = ''t''; end if; else nkey := ch::varchar || nkey; end if; pos := pos - 1; select case when substr(skey,pos - 1,1) = ''/'' then ''t'' else ''f'' end into stop; exit when stop = ''t''; END LOOP; if carry = ''t'' then nkey := ''0'' || nkey; end if; end if; select code::varchar || nkey into nkey from tree_encodings where deci = length(nkey) - 1; return nkey; end;' language 'plpgsql';
In the m-vgID model, the level corresponds to the number of '/' characters, so the pseudo level column can be calculated using the following simple pl/pgsql function:
create function tree_level(varchar) returns integer as ' declare inkey alias for $1; cnt integer default 0; begin for i in 1..length(inkey) LOOP if substr(inkey,i,1) = ''/'' then cnt := cnt + 1; end if; end LOOP; return cnt; end;' language 'plpgsql';
As with the nested set model, emulation of oracle connect-by queries can be achieved through simple queries based on the new sortkey field. Using the same data as the nested set example shown in the preceeding posts, I will show a couple of common queries with the m-vgID model.
A simple connect by statement can be emulated by selecting from acs_object_types with an "order by" on the sortkey:
select repeat(' ',(tree_level(sortkey) - 1)*2) || object_type as object_type, supertype, tree_level(sortkey) as level, sortkey from acs_object_types order by sortkey; object_type | supertype | level | sortkey -------------------+------------+-------+-------------- acs_object | | 1 | /00 relationship | acs_object | 2 | /00/00 party | acs_object | 2 | /00/01 person | party | 3 | /00/01/00 user | person | 4 | /00/01/00/00 group | party | 3 | /00/01/01 membership_rel | acs_object | 2 | /00/02 composition_rel | acs_object | 2 | /00/03 journal_entry | acs_object | 2 | /00/04 site_node | acs_object | 2 | /00/05 (10 rows)
The sub-tree rooted at the 'party' object_type can be selected as follows:
select repeat(' ',(tree_level(sortkey) - 1)*2) || object_type as object_type, supertype, tree_level(sortkey) as level, sortkey from acs_object_types where sortkey like (select sortkey || '%' from acs_object_types where object_type = 'party') order by sortkey; object_type | supertype | level | sortkey -------------+------------+-------+-------------- party | acs_object | 2 | /00/01 person | party | 3 | /00/01/00 user | person | 4 | /00/01/00/00 group | party | 3 | /00/01/01 (4 rows)
And all of the ancestors of the 'user' object_type can be selected using the following join:
select repeat(' ',(tree_level(o2.sortkey) - 1)*2) || o2.object_type as object_type, o2.supertype, tree_level(o2.sortkey) as level, o2.sortkey from acs_object_types o1, acs_object_types o2 where o1.object_type = 'user' and o2.sortkey <= o1.sortkey and o1.sortkey like (o2.sortkey || '%') order by sortkey; object_type | supertype | level | sortkey -------------+------------+-------+-------------- acs_object | | 1 | /00 party | acs_object | 2 | /00/01 person | party | 3 | /00/01/00 user | person | 4 | /00/01/00/00 (4 rows)
Leaf node indication can be obtained by using a sub-select in the target list:
select repeat(' ',(tree_level(sortkey) - 1)*2) || object_type as object_type, supertype, tree_level(sortkey) as level, sortkey, (select case when count(*) = 0 then 't' else 'f' end from acs_object_types where supertype = ot.object_type) as is_leaf_p from acs_object_types ot order by sortkey; object_type | supertype | level | sortkey | is_leaf_p -------------------+------------+-------+--------------+----------- acs_object | | 1 | /00 | f relationship | acs_object | 2 | /00/00 | t party | acs_object | 2 | /00/01 | f person | party | 3 | /00/01/00 | f user | person | 4 | /00/01/00/00 | t group | party | 3 | /00/01/01 | t membership_rel | acs_object | 2 | /00/02 | t composition_rel | acs_object | 2 | /00/03 | t journal_entry | acs_object | 2 | /00/04 | t site_node | acs_object | 2 | /00/05 | t (10 rows)
Is this the official CONNECT BY solution for OpenACS 4? Thanks.
I'm very interested to your proposed solution, because I have to maintain hierarchies in a quite big table. I'm trying to apply your posting but I get an error because the function default_tree_encoding_base() is not defined. I also don't understand how I should populate the tree_encoding table.
TIA for any advice.
Claudio Pasolini
I included a rough layout of the objects above using postgres inheritance.
a "select * from acs_user;" yields
acs_object_id | acs_party_id | person_id | user_id ---------------+--------------+-----------+--------- 1 | 1 | 1 | 1 -- TOC Entry ID 5 (OID 30230) -- -- Name: acs_object Type: TABLE Owner: root -- CREATE TABLE "acs_object" ( "acs_object_id" integer ); -- -- TOC Entry ID 6 (OID 30240) -- -- Name: acs_relationship Type: TABLE Owner: root -- CREATE TABLE "acs_relationship" ( "acs_relationship_id" integer ) INHERITS ("acs_object"); -- -- TOC Entry ID 7 (OID 30253) -- -- Name: acs_party Type: TABLE Owner: root -- CREATE TABLE "acs_party" ( "acs_party_id" integer ) INHERITS ("acs_object"); -- -- TOC Entry ID 8 (OID 30266) -- -- Name: acs_person Type: TABLE Owner: root -- CREATE TABLE "acs_person" ( "person_id" integer ) INHERITS ("acs_party"); -- -- TOC Entry ID 9 (OID 30281) -- -- Name: acs_user Type: TABLE Owner: root -- CREATE TABLE "acs_user" ( "user_id" integer ) INHERITS ("acs_person"); -- -- TOC Entry ID 10 (OID 30298) -- -- Name: acs_group Type: TABLE Owner: root -- CREATE TABLE "acs_group" ( "acs_group_id" integer ) INHERITS ("acs_party"); -- -- TOC Entry ID 11 (OID 30313) -- -- Name: acs_membership_rel Type: TABLE Owner: root -- CREATE TABLE "acs_membership_rel" ( "acs_membership_rel_id" integer ) INHERITS ("acs_object"); -- -- TOC Entry ID 12 (OID 30326) -- -- Name: acs_composition_rel Type: TABLE Owner: root -- CREATE TABLE "acs_composition_rel" ( "acs_composition_rel_id" integer ) INHERITS ("acs_object"); -- -- TOC Entry ID 13 (OID 30339) -- -- Name: acs_journal_entry Type: TABLE Owner: root -- CREATE TABLE "acs_journal_entry" ( "acs_journal_entry_id" integer ) INHERITS ("acs_object"); -- -- TOC Entry ID 14 (OID 30352) -- -- Name: acs_site_node Type: TABLE Owner: root -- CREATE TABLE "acs_site_node" ( "acs_site_node" integer ) INHERITS ("acs_object");
With regards to inheritance, we looked seriously at prior to starting the opeancs4 porting activities and we opted not to use it because it was deficient in serveral areas.
In addition to the above-named problems, the query syntax for selecting from inherited tables is not sql3 compliant and most likely it will change in the future. Currently (or at least when we looked at it) if you select * from acs_objects, you will only get rows that correspond to items that are not part of inherited tables. In order to actually get all of the rows (including the rows in inherited tables), you need to do: select * from acs_objects*. This seems very error prone - especially since it leads to silent failures. The query returns rows, just not all of the rows that you need.
This is all I can recall right now, but I wouldn't be surprised if I'm forgetting some other issues that we discussed when first looked into using inheritance.
Overall, inheritance seemed to have alot of potential - escpecially in regards to openacs4's object model, but the postgresql implementation is just not up to the task yet.
From my testing today, triggers and, I assume, indices, still do not get passed on.
It looks to me like the 2nd issue has been cleared up. I'm looking at a Postgres 7.1.3 database and an "insert into acs_user" followed by "select * from acs_object" shows the row I just inserted.
To expand on Dan's list a bit, there's no way to enforce a unique key across a class and its subclasses, for instance the primary key, because each class and subclass has its own separate index (which you must remember to create by hand). This is very bad.
Also, RI doesn't work with an inheritance tree at this time - the triggers will enforce RI integrity on the named class (i.e. relation/table) rather than the class and its subclasses.
As Dan said - lots of potential, but it's not really feature complete at this time.
I downloaded the OpenACS 4 code and now all the triggers and functions work fine.
We've switched to using "between", along with a helper function. Check out the sources for many examples - grep for "tree_sortkey between".