Forum OpenACS Development: Response to CONNECT BY solution

Collapse
Posted by Dan Wickstrom on
As part of the acs 4.x porting effort, I've done a trial implementation of the "m-vgID" method that Miguel Sofer outlines in his paper. This method provides the flexiblity of the nested set model, and it doesn't suffer from the same performance problems on insert/update operations. In addition, the implementation is simpler.

As with the nested set model, the tree hierarchy is encoded in a separate field, and triggers are used to maintain the hierarchy.

create table acs_object_types (
              object_type     varchar(100) primary key,
              supertype       varchar(100) references acs_object_types,
              sortkey         varchar(4000)       
);

The character Encodings are kept in a separate table which is used for doing the conversion to and from base-159 encoding.

CREATE TABLE tree_encodings (
        deci int primary key,
	code char(1) 
);

create index tree_encode_idx on tree_encodings(code);

In discussing the base-159 encoding scheme, Sofer mentions in his paper that if the database is not built with internationalization support, then it might be necessary to limit the key encodings to ASCII characters. In such a case, it would be necessary to use a base-64 encoding scheme. In my testing, I'm using a plain vanilla build of postgresql, and the base-159 encoding loads correctly, and it seems to operate well, so I'm not sure about Sofer's comment in this regard. It seems that postgresql supports the latin-1 character set by default. I would be interested in hearing comments from people with regards to potential problems that might be encountered if base-159 encoding is chosen as the default.

For the m-vgID model, there are two triggers, one for inserts and one for updates. The insert trigger function shown below, is quite simple. To calculate the newly inserted nodes sortkey value, it finds the max sortkey value among its sister nodes, increments it by one and appends it to it to the sortkey of its parent. If the newly inserted node doesn't have any sister nodes, then the sortkey value is set to '00' (see Sofer's paper for an explanation of the encoding of sortkeys in the m-vgID model).

create function acs_object_type_insert_tr () returns opaque as '
declare
        v_parent_sk     varchar;
        max_key         varchar;
begin
        select max(sortkey) into max_key 
          from acs_object_types 
         where supertype = new.supertype;

        select coalesce(max(sortkey),'''') into v_parent_sk 
          from acs_object_types 
         where object_type = new.supertype;

        new.sortkey := v_parent_sk || ''/'' || next_sortkey(max_key);

        return new;

end;' language 'plpgsql';

create trigger acs_object_type_insert_tr before insert 
on acs_object_types for each row 
execute procedure acs_object_type_insert_tr ();

The update trigger, is slightly more complicated, as it must update the whole sub-tree for the newly updated node. The first time through the loop, the clr_keys_p flag is true, and this forces all of the sub-trees sortkeys to be set to null, as a result, all of the sortkeys in the sub-tree are recomputed.

create function acs_object_type_update_tr () returns opaque as '
declare
        v_parent_sk     varchar;
        max_key         varchar;
        v_rec           record;
        clr_keys_p      boolean default ''t'';
begin
        if new.object_type = old.object_type and 
           new.supertype = old.supertype then

           return new;

        end if;

        for v_rec in select object_type
                       from acs_object_types 
                      where sortkey like new.sortkey || ''%''
                   order by sortkey
        LOOP
            if clr_keys_p then
               update acs_object_types set sortkey = null
               where sortkey like new.sortkey || ''%'';
               clr_keys_p := ''f'';
            end if;
            
            select max(sortkey) into max_key
              from acs_object_types 
              where supertype = (select supertype 
                                   from acs_object_types 
                                  where object_type = v_rec.object_type);

            select coalesce(max(sortkey),'''') into v_parent_sk 
              from acs_object_types 
             where object_type = (select supertype 
                                   from acs_object_types 
                                  where object_type = v_rec.object_type);

            update acs_object_types 
               set sortkey = v_parent_sk || ''/'' || next_sortkey(max_key)
             where object_type = v_rec.object_type;

        end LOOP;

        return new;

end;' language 'plpgsql';

create trigger acs_object_type_update_tr after update on acs_object_types
for each row execute procedure acs_object_type_update_tr ();

Each of the triggers references the "next_sortkey" routine shown below. This routine takes as its input the max sortkey value of the current nodes sister nodes and increments it by one using base 159 math. If the current node doesn't have any sister nodes at the same level, then a null is passed in and a value of "00" is returned. "00" is the encoded value of the first key for any level.

create function next_sortkey(varchar) returns varchar as '
declare
       skey     alias for $1;
       pos      integer;        
       dec_val  integer;
       stop     boolean default ''f'';
       carry    boolean default ''t'';
       nkey     varchar default '''';
       base     integer;
       ch       char(1);
begin
        base := default_tree_encoding_base();

        if skey is null then

           return ''00'';

        else
           pos := length(skey);
           LOOP               
               ch := substr(skey,pos,1);
               if carry = ''t'' then 
                   carry = ''f'';
                   select (deci + 1) % base into dec_val
                     from tree_encodings
                    where code = ch;
                   select code::varchar || nkey into nkey
                     from tree_encodings 
                    where deci = dec_val;
                   if dec_val = 0 then carry = ''t''; end if;
               else
                   nkey := ch::varchar || nkey;
               end if;
               pos := pos - 1;               
               select case when substr(skey,pos - 1,1) = ''/'' 
                               then ''t'' 
                               else ''f'' 
                           end into stop;

               exit when stop = ''t'';

           END LOOP;
           if carry = ''t'' then 
              nkey := ''0'' || nkey;
           end if;
        end if;

        select code::varchar || nkey into nkey
          from tree_encodings 
         where deci = length(nkey) - 1;

        return nkey;

end;' language 'plpgsql';

In the m-vgID model, the level corresponds to the number of '/' characters, so the pseudo level column can be calculated using the following simple pl/pgsql function:

create function tree_level(varchar) returns integer as '
declare
        inkey     alias for $1;
        cnt       integer default 0;
begin
        for i in 1..length(inkey) LOOP
            if substr(inkey,i,1) = ''/'' then
               cnt := cnt + 1;
            end if;
        end LOOP;

        return cnt;

end;' language 'plpgsql';

As with the nested set model, emulation of oracle connect-by queries can be achieved through simple queries based on the new sortkey field. Using the same data as the nested set example shown in the preceeding posts, I will show a couple of common queries with the m-vgID model.

A simple connect by statement can be emulated by selecting from acs_object_types with an "order by" on the sortkey:

select repeat(' ',(tree_level(sortkey) - 1)*2) || object_type as object_type, 
       supertype, 
       tree_level(sortkey) as level, 
       sortkey  
  from acs_object_types 
 order by sortkey;


    object_type    | supertype  | level |   sortkey
-------------------+------------+-------+--------------
 acs_object        |            |     1 | /00
   relationship    | acs_object |     2 | /00/00
   party           | acs_object |     2 | /00/01
     person        | party      |     3 | /00/01/00
       user        | person     |     4 | /00/01/00/00
     group         | party      |     3 | /00/01/01
   membership_rel  | acs_object |     2 | /00/02
   composition_rel | acs_object |     2 | /00/03
   journal_entry   | acs_object |     2 | /00/04
   site_node       | acs_object |     2 | /00/05                 
(10 rows) 

The sub-tree rooted at the 'party' object_type can be selected as follows:

select repeat(' ',(tree_level(sortkey) - 1)*2) || object_type as object_type, 
       supertype, 
       tree_level(sortkey) as level, 
       sortkey  
  from acs_object_types 
 where sortkey like (select sortkey || '%' 
                       from acs_object_types 
                      where object_type = 'party')
 order by sortkey;

 object_type | supertype  | level |   sortkey
-------------+------------+-------+--------------
   party     | acs_object |     2 | /00/01
     person  | party      |     3 | /00/01/00
       user  | person     |     4 | /00/01/00/00
     group   | party      |     3 | /00/01/01
(4 rows)                                                                       

And all of the ancestors of the 'user' object_type can be selected using the following join:

select repeat(' ',(tree_level(o2.sortkey) - 1)*2) || o2.object_type as object_type, 
       o2.supertype, 
       tree_level(o2.sortkey) as level, 
       o2.sortkey  
  from acs_object_types o1,  acs_object_types o2
 where o1.object_type = 'user'
   and o2.sortkey <= o1.sortkey
   and o1.sortkey like (o2.sortkey || '%') 
 order by sortkey;

 object_type | supertype  | level |   sortkey
-------------+------------+-------+--------------
 acs_object  |            |     1 | /00
   party     | acs_object |     2 | /00/01
     person  | party      |     3 | /00/01/00
       user  | person     |     4 | /00/01/00/00
(4 rows)                                                                       

Leaf node indication can be obtained by using a sub-select in the target list:

select repeat(' ',(tree_level(sortkey) - 1)*2) || object_type as object_type, 
       supertype, 
       tree_level(sortkey) as level, 
       sortkey,
       (select case when count(*) = 0 then 't' else 'f' end 
          from acs_object_types 
         where supertype = ot.object_type) as is_leaf_p
  from acs_object_types ot
 order by sortkey;

    object_type    | supertype  | level |   sortkey    | is_leaf_p 
-------------------+------------+-------+--------------+-----------
 acs_object        |            |     1 | /00          | f
   relationship    | acs_object |     2 | /00/00       | t
   party           | acs_object |     2 | /00/01       | f
     person        | party      |     3 | /00/01/00    | f
       user        | person     |     4 | /00/01/00/00 | t
     group         | party      |     3 | /00/01/01    | t
   membership_rel  | acs_object |     2 | /00/02       | t
   composition_rel | acs_object |     2 | /00/03       | t
   journal_entry   | acs_object |     2 | /00/04       | t
   site_node       | acs_object |     2 | /00/05       | t
(10 rows)