Forum OpenACS Development: tree_sortkey varbit limitations

Collapse
Posted by Tom Jackson on

I was hoping to use the acs_objects use of tree_sortkey to create a hierarchy. I just use the context_id to point to the parent object, and the tree_sortkey gets updated like magic.

Here is some example data from acs_objects:

object_id |                   tree_sortkey
-----------+--------------------------------------------------
       370 | 00000100000000000000001100000000
       371 | 0000010000000000000000110000000000000000
       372 | 000001000000000000000011000000000000000000000000
       374 | 000001000000000000000011000000000000000000000001
       373 | 0000010000000000000000110000000000000001
       375 | 00000100000000000000001100000001
       376 | 0000010000000000000000110000000100000000

I used the following db_multirow to create a visual representation of the hierarchy:

db_multirow -extend {
    indent
} categories blog_categories "
select
 c.*,
 o.context_id,
 o.tree_sortkey
from
 blog_categories c,
 acs_objects o
where
 c.category_id = o.object_id
order by
 o.tree_sortkey" {

     set varbit_length [string length $tree_sortkey]
     set indent_amount [expr ($varbit_length -32) / 4]
     set indent [string repeat "." $indent_amount]
 }

Each varbit is at least 32 chars long, and each child varbit is an additional 8 chars, so the above code adds two indent chars per child.

Now the question: it looks like this implimentation only allows for 256 children, or 8 bits. It this true, or is there more magic hidden in the pl code which maintains this?

Collapse
Posted by Jun Yamog on
Hi Tom,

I am a bit sleepy now, although not directly answering your question.  You can look at tree_level call instead of getting the length of tree_sortkey, etc.

Collapse
Posted by Tom Jackson on

Thanks Jun, that code seems to confirm that the toplevel has 32 bits of slots open, while children are limited to 8 bits per level. I'm not sure the using the tree_level would be any more efficient than the tcl code, they do pretty much the same thing, but tree_level is more general. I would still need to extend the db_multirow, or at least run a foreach to create the formatting.

Collapse
Posted by Don Baccus on
You'll note that the 32 bit (*not* char BTW) sortkeys have leading zeros while the 8 bit sortkeys have leading ones ...

I presume that's enough to answer your question?  If not check the sortkeys for the first 256 objects in your install :)

Collapse
Posted by Don Baccus on
BTW it's easy to calculate your indention string in the query itself using tree_level() - this isolates you from the consequences of any change in representation (not that I expect there to be any but still ...)
Collapse
Posted by Don Baccus on
Sorry flip my zeros and ones above ... keys 0..127 (not 255 :) and 32-bit keys start with 1. So your examples above are more deeply nested trees than you think:
openacs46=# select tree_level('00000100000000000000001100000000');
 tree_level 
------------
          4
(1 row)

openacs46=# 

Also if you calculate the indention string directly in your query you don't need to extend or db_foreach, just build a string with the right number of spaces:

lpad(' ',(tree_level(fs_tree.tree_sortkey) - 1), ' ') as spaces
Collapse
Posted by Tom Jackson on

Okay, I'm looking further. It seems from the level code that maybe if you go over the limit for the short 7 bit key plus the leading zero, you get a 31 bit key, plus the leading one.

So the tcl code I wrote would break pretty quickly in normal use. But will every installation start out at the same level? Maybe not? I might need to get the level for the root of my tree first.

Collapse
Posted by Tom Jackson on

So here are the results of adding over 127 children to one parent:

 object_id |                           tree_sortkey
-----------+------------------------------------------------------------------

       518 | 0000010000000000000000110000000001111100
       519 | 0000010000000000000000110000000001111101
       520 | 0000010000000000000000110000000001111110
       521 | 0000010000000000000000110000000001111111
       522 | 0000010000000000000000110000000010000000000000000000000010000000
       523 | 0000010000000000000000110000000010000000000000000000000010000001
       524 | 0000010000000000000000110000000010000000000000000000000010000010
       525 | 0000010000000000000000110000000010000000000000000000000010000011
(134 rows)

So the pl code smoothly transitions to a larger key. Cool!

Collapse
Posted by Tom Jackson on

One last note: besides being correct to use the tree_level function, it is significantly faster, probably an order of magnitude. The db_multirow ended up being:


db_multirow categories blog_categories "
select
 c.*,
 o.context_id,
 lpad('.',(tree_level(o.tree_sortkey) - 4), '.') as indent
from
 blog_categories c,
 acs_objects o
where
 c.category_id = o.object_id
order by
 o.tree_sortkey"
Collapse
Posted by Don Baccus on
Yes, that's it exactly, Tom - at the 7 bit limit I add a leading 1 and 31 bits of key data.

This ensures that 128 > 127 when compared as bitstrings:

127 = 01111111
128 = 10000000000000000000000010000000

PG bitstrings are compared just like bytestrings, in other words they're matched bit-by-bit until there's a mismatch, then the larger bit wins.

A little thought should convince you that this works for sortkeys attached to nodes at any level in the tree, of any number.