Forum OpenACS Development: Response to New implementation of tree_sortkey

Collapse
Posted by Dan Wickstrom on
I've been playing around with the new tree query support and I'm curious about a few of the implementation details. I haven't had a chance to follow the new tree query development in detail, so it's possible I missed the answers to my questions when I searched the forums.

Limiting the number of number of nodes per-level to 2^15 nodes seems to be too restrictive and it's likely to cause problems as the size of a site grows. A probable problem area in the kernel is the acs_objects table. The tree formed by acs_objects is likely to be wide and shallow, so it's not too hard to imagine more than 2^15 objects with identical context_ids. The length of the key value at each level is currently encoded with one bit. I think it would be prudent to either allocate two bits per-level for encoding the length, thus allowing key lengths of 6,14,22 or 30 bits, or as an alternative the current default key lengths could be changed from 7 and 15 bits to 15 and 31 bits. The first method would allow ~ 2^30 nodes per level while the second method would allow ~ 2^31 nodes per level. The storage requirements for the sortkeys would increase, but you would still have byte aligned data, I don't think the overall increase in query time would be significant. It should still be better than the old text based sortkeys.

There is a comment in acs-kernel/sql/postgresql/postgres.sql that the old text-based sortkey method was limited to 25k nodes per level. This is an incorrect statement. The old text-based sortkeys, for all practical purposes, had no limit on the number of nodes per-level. If you wanted to be pedantic, the limit = 159^159 ~ 1e350, and there was a tradeoff between the depth of the tree and the number of nodes per-level.

Why is a helper function required to get the tree_sortkey for a table when using the tree_ancestor_keys function? I've tried replacing the helper function with a select and I don't see any difference in the cost or execution time of the query. There is a quote in postgres.sql that indicates that the query will not work without the helper function:

"This query will use the index on tree_sortkey to scan acs_objects. The function to grab the tree_sortkey for the node is necessary (and must be defined for each table that uses our hierarchical query scheme) to avoid restrictions on the use of SQL functions that return sets. "

This however, doesn't seem to be the case as shown by the example below.


testtree=# select o.*
testtree-#  from tree_view o,
testtree-#    (select tree_ancestor_keys(acs_objects_get_tree_sortkey(2051)) as tree_sortkey) parents
testtree-#  where o.tree_sortkey = parents.tree_sortkey;
 level |     object_id     | context_id |                       tree_sortkey                       
-------+-------------------+------------+----------------------------------------------------------
     1 |  1                |            | 00000000
     2 |    307            |          1 | 0000000000000000
     3 |      311          |        307 | 000000000000000000000000
     4 |        323        |        311 | 00000000000000000000000000000000
     5 |          371      |        323 | 0000000000000000000000000000000000000000
     6 |            611    |        371 | 000000000000000000000000000000000000000000000000
     7 |              2051 |        611 | 00000000000000000000000000000000000000000000000000000000
(7 rows)

testtree=# explain select o.*
testtree-#  from tree_view o,
testtree-#    (select tree_ancestor_keys(acs_objects_get_tree_sortkey(2051)) as tree_sortkey) parents
testtree-#  where o.tree_sortkey = parents.tree_sortkey;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..2.03 rows=1 width=32)
  ->  Subquery Scan parents  (cost=0.00..0.00 rows=0 width=0)
        ->  Result  (cost=0.00..0.00 rows=0 width=0)
  ->  Index Scan using acs_objs_tree_skey_idx on acs_objects  (cost=0.00..2.02 rows=1 width=20)

EXPLAIN
testtree=# select o.*
testtree-#  from tree_view o,
testtree-#    (select tree_ancestor_keys((select tree_sortkey from acs_objects where object_id = 2051)) as tree_sortkey) parents
testtree-#  where o.tree_sortkey = parents.tree_sortkey;
 level |     object_id     | context_id |                       tree_sortkey                       
-------+-------------------+------------+----------------------------------------------------------
     1 |  1                |            | 00000000
     2 |    307            |          1 | 0000000000000000
     3 |      311          |        307 | 000000000000000000000000
     4 |        323        |        311 | 00000000000000000000000000000000
     5 |          371      |        323 | 0000000000000000000000000000000000000000
     6 |            611    |        371 | 000000000000000000000000000000000000000000000000
     7 |              2051 |        611 | 00000000000000000000000000000000000000000000000000000000
(7 rows)

testtree=# explain select o.*
testtree-#  from tree_view o,
testtree-#    (select tree_ancestor_keys((select tree_sortkey from acs_objects where object_id = 2051)) as tree_sortkey) parents
testtree-#  where o.tree_sortkey = parents.tree_sortkey;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..2.03 rows=1 width=32)
  ->  Subquery Scan parents  (cost=0.00..0.00 rows=0 width=0)
        ->  Result  (cost=0.00..0.00 rows=0 width=0)
              InitPlan
                ->  Index Scan using acs_objects_pkey on acs_objects  (cost=0.00..2.02 rows=1 width=12)
  ->  Index Scan using acs_objs_tree_skey_idx on acs_objects  (cost=0.00..2.02 rows=1 width=20)

EXPLAIN
testtree=# 

Overall, the implementation looks good, and it's nice to be able to eliminate the LANG=C requirement and the dependence on the collation order.