Forum OpenACS Development: Some thoughts with respect to the permissions system

This morning I stumbled across this thread: https://openacs.org/forums/message-view?message_id=170602 and more specifically this message from Don:
https://openacs.org/forums/message-view?message_id=174090

As I had a long commute today, I had some time to think about
it and here's the result. Consider it food-for-thought.

For those who did not RTFA :) - there are (at least) two issues
identified in the referenced threads:

a) There is no clean way to make sure that e.g. the original
  poster of a forum-message gets admin rights to his/her own
  message, other than explicitly assigning the privilege
  on the specific message

b) Checking permissions is relatively slow, despite all the
  optimalizations. And, mostly because of those very same
  optimalizations, the system is relatively complex.

----
To avoid flamewar-like responses like I saw in other permissionsystem-related threads: I'm *not* saying it is *too* slow or *too* complex, as I've done no benchmarks myself and have not found any good benchmarks to support such a claim. I'm just kicking around ideas. Feel free to ignore me.
----

If you read the forums, the topic gets a fair share of
attention and a lot of good ideas are tossed up. By combining several ideas in a clever manner I
think we can simplify the permission system and substantially gain on performance at the same time.

The basic problems are that:
* there is no concept of ownership, requiring quite a lot
  of direct, explicit privilege assignments. Beside being
  a pain in itself, it bloats the permissions system.
* the all_object_party_privileges_map depends on
  three huge flattened trees: context, parties and privileges.
  The size of this beast grows polynomial with the number
  of users & security contexts (read: objects). This means that the performance
  of the permissions system severly declines with
  increasing numbers of users and security contexts -
  or more in general the size of the site.

Basically my idea would be to:
1) Introduce the concept of ownership
    - reduces the need for explicit, direct permissions
2) Switch to an exclusively role-based access control system
    - de-couples the permissions check from the parties hierarchy
3) Switch to a limited set of 'horizontally stored' privileges
    (one-row-per-object, I think I found a solution to Don's
    implicit privileges problem).
    - de-couples the permissions check from both the privilege
      hierarchy and the context tree

The end result should be a permissions system where the cost of
CHECKING permissions is (almost) unrelated to the size of the site
or the number of users. Of course, no free lunches, the cost probably
moves to a certain extend to the ASSIGNING & CHANGING of permissions.
But this happens less frequently, is generaly not noticeable by the
end-user and I think it will not be too bad once implemented decently.

Each of those topics is explained in more detail below.

Introduction of ownership
--------------------------
By introducing object ownership, the issue as raised by dave
in the mentioned thread disappears and it will significantly
cut back on the need for directly assigned (non-inherited)
privileges.

We can then just assign privileges to 'the owner' - whomever
that happens to be(come).

Proposal: add an 'owner' field to acs_object, which defaults
to creation_user.

I'm aware of the fact that this has a 4 byte (on most platforms)
penalty per object, which is potentially a big number.
However, modifying_ip & modifying_user do not seem to have a lot
of serious usage (besides the kernel there are only 6 packages
that refer to them, and mostly just because the fields exist -
not because they're really used).

Besides that, I believe that those belong in an audit-log rather
than as attributes on an object - but that's just my opinion.

Anyway may be we can drop those - the nett-saving would be
somewhere around 12-20 bytes per object.

Switch to RBAC
----------------

Step two would be to never assign privileges directly to users,
but to roles (or groups - roles are essentially groups of course).

You may have 20k users, you'll never have 20k roles.  I think that
even in a very large organisation this number would not be more
than a few hundreds (and that would be a very bad case).

BTW: if I say roles are groups, I think of very simple
flat groups, not the complex things the oacs group system
can accomodate. I want just a simple mapping table that maps
users & roles.

Which roles a user can act in can easily be cached in a user/roles mapping table. It only
changes when a user's group-membership gets changed or if
he gets assigned other roles. Most users will have
only one or two roles and they normally do not change often.

This would also solve Dave's original problem
from the mentioned thread - there is always a standard role
of 'owner'.

Smart use of subsites (which I'm sure large site-maintainers
already do), combined with the use of roles rather than
assigning privileges directly has two advantages:

* it decouples the performance of the privilege system
  to a very large extend from the number of users (coupling it
  instead to the average number of different roles used per object,
  which will probably be near-constant even for large sites).
* cut back on the number of privileges to assign in general
  - which eases administration as a bonus

Worst case is that you're sometimes forced to create a role that
will only be used by one person. Imo that's not a big deal.

Horizontal privileges
-----------------------

(pay attention, this is a big deal)

Then step 3, both Don and Lars had ideas with respect to
"horizontal" privileges (storing the actual privileges
in the same row as the object_id/grantee combination with
one row per object).

For my line of thought using Don's bitstring idea (1 bit per priv)
is the best. Don's problem was the implicit and
inherited permissions.

I guess we can solve that by having three privilege columns per
object/grantee pair. All identical bitstrings. One containing the
direct and explicitly set privileges, one containing the inherited
privileges and one containing all privileges (calculated by or'ing
the explicit & inherited privileges and determining which privileges
are implied by the privileges then found). This 'all_privs' field
then becomes the 'inherited_privs' field of the direct descending
object (or security context).

There is a penalty of course when permissions are changed for a
given object, as the implicit & inherited privileges are to be
recalculated. I've not yet looked into how hard that would be but
I can't imagine that it is really huge as we essentially do it now on
every perm check.

The permissions table then would have rows (for each and every
object) like:
(object_id, role_id, explicit_privs, inherited_privs, all_privs)

** This would decouple the effort of checking for permissions
from the size of the context tree **

However, the size of the permissions table would grow linear with
respect to the number of objects and also linear with the
average number of different roles that have privileges assigned
on a certain object. This is by no means polynomial anymore, but
still far from constant.

Therefor, I would like to store a real security 'context' field for every
object - the context where that particular object actually inherits its
privileges from.

E.g. a message in a forum has its context_id set to
the object_id of the forum package, which has its context_id set
to (most likely) the object_id of the subsite. Now, if the subsite
grants 'read' privileges to everyone with the role 'member', and
both the forum and the message don't have direct privileges assigned,
the 'real' security context for the forum message should be
the object_id of the subsite, not of the forum package

(Because of the 'owner' role, there is probably no need to
assign direct privileges to individual forum posts.)

I'd say, rename the current context_id to parent_id (as it is mostly
(ab)used in that way already and in HEAD there is already a parent_id defined for this purpose) and use the context_id to really indicate
the security context. (see also https://openacs.org/forums/message-view?message_id=162608
and
https://openacs.org/forums/message-view?message_id=23188).

By doing so, we could eliminate all objects from the
permissions table that don't have directly assigned privileges WITHOUT
having to join against the context hierarchy (I'm not even sure
there is a use for an explicit context hierarchy anymore in this case).

With respect to custom privileges: afaict custom privileges
only have meaning locally to a package. So we could use
a 64 bit bitstring to store privileges, with the first
32 reserved for global, standardized privileges and the other
32 for local package use. That way each package can define up
to 32 custom privileges without any penalty. The kernel
just stores & retrieves them without having to know their
meaning (you effectively ask the kernel "is bit 34 set or not?").

Putting it all together
-------------------------

At the end we would have a table with rows like:

(object_id, role_id, explicit_privs, inherited_privs, all_privs)

In this table you have only objects that have set
direct privileges (as opposed to inherited privs).
Figuring out which privileges a certain user has on
a certain object would then involve the following steps:

- figure out which roles a user has (fast, can be cached by
  storing the user/role mappings in a seperate table which is
  kept up to date by triggers)
- figure out in which security context the object lives
  (easy, defined as an attribute on the object itself)
- get the 'all_privs' columns for the relevant object/context
- check if the correct bit is set or not

The size and performance of this system would only depend on two factors:
- the total number of objects that have direct privileges assigned
- the average number of different roles per object where privileges
  are assigned to.

This means that the relation between performance of the permissions system and the size of the site should be near constant (or log2(N) in worst case, as Don mentioned earlier "Huge tables are more a space problem, though, not necessarily a significant performance problem given the log2(N) performance characteristics of queries on simple tables with proper indexes.")

This new query would replace the acs_object_party_privilege_map and should
be both much simpler and faster. Without sacrificing any of the functionality
in existance today.

Migration from current situation
--------------------------------------

The good news is that the actual permissions checking mechanism
would remain the same. Just the way we compute
the answer will differ. So most application code does not have
to be touched at all.

The worse news is that the way privileges are assigned
is substantially different, as you can no longer
assign privileges to users - only to roles. The better news is that
this is highly centralized (if I'm correctly informed).

The only real issue I see (besides the work itself :) is the
implementation of the 'ownership' concept, as that is
sprinkled throughout the code of several packages.

------------------------------
THE END
------------------------------

Gentlemen, load your guns and enjoy yourself by shooting big gaping holes in my little thought-experiment ...

Collapse
Posted by Don Baccus on
At the risk of seeming rude ...

May I suggest that rather than continue to try to make mountains of molehills we concentrate on known mountains?

The permissions system has worked very well for us in nearly all cases.  I will lobby extremely hard against any fundamental change of the semantics of the permission system.

As far as making the permissions system work more efficiently without changing the underlying semantics, AFAIC go for it.

However I already know how we can get another large improvement in permissions performance for complex sites, particularly .LRN:

Get rid of custom permissions that duplicate global permissions, i.e. foo_read and the like.  A standard .LRN install gives us 69 privileges.  I think that could be cut down to perhaps 10 or so.

It is also clear you misunderstand how the permissions system scales today.  While the view you mention would grow polynomially if it were materialized, it never is.  It's always qualified by specific objects, parties etc.

In this case the two largest tables - the object table and the context hierarchy table - are joined using b-trees.

This means the cost for permissions checking grows at log2(N), in reality there's no polynomial cost involved.  If there were, the system would collapse after a few hundred or thousand objects were inserted.

As it is, we're seeing sites with over a million objects that perform reasonably well on modest hardware.  Galileo's .LRN installation comes to mind.  1.5 million objects, something like 16,000 users, running on a relatively modest dual P4 Xeon system.

You have to understand that my efforts to improve permissions performance done for 4.6 was my *first* attempt to attack the largest issue, the fact that PG doesn't optimize UNION views and that this, in fact, did lead to polynomial degradation of the performance (PG would materialize the views).

It was never intended to be my LAST effort to improve performance.  It's just that performance improved so dramatically that permissions performance ceased to be crisis.

Now that we're starting to see some very large OpenACS sites, particularly .LRN, it's probably become time to look into improving performance once again, but I haven't heard anyone claim that it's a crisis as it once was ...

BTW in practice using groups/roles to assign permissions rather than do so to individuals is actually what we do except for a user's own objects, and a direction we want to continue to move in.  This is why we have an admin group built automatically for subsites now (as of ... 5.0 I think?)

This reduces the need to assign individual permissions to users.

It's not dependent on changing the underlying semantics of the permissions system, though, we can have this benefit without making any changes whatsoever in the semantics...

Ok - makes sense. As I said, I was just kicking around ideas.
And I indeed didn't realize at the moment that those large views never materialize.

And how about just implementing the owner concept? That seems like a usefull thing to me. Or is there a specific reason it's never implemented? I tried searching the forums about this topic but got only posts refering to package ownership - which was not exactly what I meant :)

Collapse
Posted by Don Baccus on
A minor point ...

"By doing so, we could eliminate all objects from the
permissions table that don't have directly assigned privileges WITHOUT
having to join against the context hierarchy (I'm not even sure
there is a use for an explicit context hierarchy anymore in this case)."

But you would have to join against the object table to pick up context_id, right?  Aren't you just replacing one join with another?

Now, there are more rows in the context hierarchy table than there are in the objects table but each row only contains two integers, while objects are considerably larger.  The size of  this table hasn't really been an issue thus far.

Yes - indeed, that was part of the idea. Trade joining against a large table for joining against a smaller table.

Also you could actually keep joining against acs_object_context_index,  and transform that table just into a flat (object_id,context_id) mapping.

Anyway - all said and done I don't think this would be a big win after all.

The bitstring privilege approach still feels like there should be something to gain from, but indeed the question is 'how much'.

Collapse
Posted by Don Baccus on
The one denormalized table that does grow to be huge is the acs_object_party_privilege_map.  However that's a problem that's much worse with Postgres, because of locking issues in PG within PL/pgSQL code.  In Oracle I don't keep duplicate rows, but rather maintain a counter.  But I can't properly lock within triggers in PG so have to keep duplicate rows (which occur due to the relseg/group model in particular, but can also occur elsewhere, there's no restriction on the number of times you can grant a party a privilege on an object).

This is particularly a problem for .LRN.

But ... despite this, we see surprisingly acceptable performance for sites like Galileo (I'm trying to gather some statistics now).

I'm still interested in the bitstring approach, but am still unsure it's worth the trouble.

Since the bitstring approach requires we reduce the number of privileges in the system and since this will speed permission checking somewhat, doing the reduction still seems like the first step.

Plus the code wouldn't be as stupid, frankly seeing packages define "foo-read" etc just looks stupid :)

Is an ownership field separate from creation_user really that useful?  I'm suspicious of granting rights implicitly unless there's a very good reason to do so...

Also I think if we did a permissions audit of existing packages we'd find there's a fair amount of needless perms being granted.  File storage seems particularly weird in this regard.

Collapse
Posted by Don Baccus on
The win for only allowing one direct context_id for inheritance (rather than the inheritance tree we have today) is essentially just the average depth of the inheritance tree times the number of objects multiplied by the average number of objects that don't have inheritance turned off.

While I don't have any real-world stats for that, those are essentially linear relationships.  Given the indexes on the table, this means we have a log2(N) join cost on something that scales by some combination of linear factors.

So I don't think there's a LOT to gain in scalability from getting rid of the inheritance tree, moving to a simple inherit-from-one-object-only paradigm.

Of course in all our work with denormalized tables we're assuming that space is relatively unimportant.  With hobbyists being able to build servers with 4GB RAM thus far I think we're on pretty safe ground making this assumption.  I did worry about this a bit when I introduced the acs_object_party_privilege_map but as mentioned above, thus far for real sites it's been OK to eat the space.