Forum OpenACS Development: Response to ACS 4.x won't scale (I hope I am wrong)

Collapse
Posted by Albert Langer on
Fine grained permissions within the DBMS *are* a difficult problem. The implementation is bad but enforcing permissions in the
application code has proved pretty horrendous too, so I wouldn't
like to lightly abandon shielding it from application developers.

I've been worrying about this since looking at the trees stuff in
the kernel and rushing off to review details of the various
methods for Oracle CONNECT BY, DB2 WITH, SQL92 RECURSIVE,
Celko's stuff and the ingenious Postgresql approach used
here:

https://openacs.org/forums/message-view?message_id=16799

http://www.arsdigita.com/books/sql/trees.html

Joe Celko, SQL for Smarties, in DBMS Online, March to June 1996
<http://www.dbmsmag.com/9603d06.html>
<http://www.dbmsmag.com/9604d06.html>
<http://www.dbmsmag.com/9605d06.html>
<http://www.dbmsmag.com/9606d06.html>

Graeme Birchall, DB2 UDB V6.1 SQL Cookbook,
http://ourworld.compuserve.com/homepages/Graeme_Birchall/DB2V70CK.PDF
via
http://ourworld.compuserve.com/homepages/Graeme_Birchall/HTM_COOK.HTM

In case a re-design is necessary I'd suggest carefully reviewing
the APIs now to ensure that work which has to go ahead on porting
before any such redesign won't have to be redone as a result of it
due to some omission of parameters in the API that could be provided
for, even though not really used now.

I haven't looked at OpenLDAP recently but last time I looked
the umich LDAP implementations, of which OpenLDAP is one, just
read a permission scheme into RAM from a config file at startup.
There's still no IETF standard for administering access controls
externally via LDAP.

Following won't help with Jon's problem for Monday, but I
believe the ActiveDirectory approach could be the ultimate
speedup if a redesign is necessary.

Each entry (acs_object) has an extra column for an opaque
binary "security_descriptor". This encodes (horrendously but
efficiently) a complex structured object which completely
spells out what groups and roles (or by exception individuals)
have exactly what permissions for specific sets of columns or
columns of that entry, *including* those inherited from the
context so there is no need to do any join at all. Instead
the *user* has a token (derived from a kerberos ticket)
which equally opaquely, efficiently and horrendously encodes
what groups and roles etc they have. The system then compares
the security token and description to filter the results obtained
from the underlying database (a simple JET B-tree ISAM similar
to BerkelyDB) or to veto changes etc. It might be possible to
do something equivalent with embedded java for Oracle and
embedded Python or C for PostgreSQL though it would be a major
undertaking and I can't see how it could be done for an RDBMS
that doesn't have a powerful procedural language for doing
complex calculations (though who knows - I'd have given up on
an instring implemented in pgsql 😉

This is designed for very fine grained permissions with the
Directory situation of large numbers of brief connections instead of the usual DBMS situation. It works much more efficiently than the
more conventional approach used in Novell Directories and I believe
they have also started extending it to MS SQLserver etc.

One of the attractions for Microsoft of course is that you can
only modify security descriptors (and create session access tokens)
through system APIs that are an effective barrier to competition,
even though the details are fully documented.

However I believe the approach also does have technical merit for
getting the highest possible performance. It trades off a relatively
small extra computation (plus significantly more complex design for
the code that actually does this) on each *access* to permissions,
for no join against data from any other entry on each read or write to an entry(row).

In a sense they are splitting the join into a view of groups and roles the user has, crystalized as a token for each *connected* user
and a view of group and role permissions that an object has,
crystalized as a security_descriptor for each row, with the join
criteria itself being a complex function (available in Postgresql and in Oracle) of these two crystalized views instead of a natural
join on some linking columns.

The performance tradeoff is just that for *changing* permissions from
a higher level context, the write trigger has to propagate it to
the entire sub-tree below - but that is very infrequent compared
to the continous permissions checking on every select and update and
the small extra calculations to copy the parent context's permissions
into the child row's security_descriptor in insert triggers.

This works especially well with the kerberos integration, which
includes a list of groups the user is a member of in the ticket
used to connect for creating a user token. (But of course the Kerberos server gets that list by querying the directory and a DBMS can do the same to get the list of groups and roles for a user).

An equivalent might be to perform a "login" by creating a token
with a list of groups for a user and inserting that into a user_session table (already present in ACS) before doing any
any reads or writes "as" that user.

Then each stored procedure API call for a read or write has
to include the userid of the user on whose behalf the call is
being made, and calls a sub-procedure which looks up this
stuff from the connected users table before doing whatever
else it has to do.

It would be overkill if there's a simpler solution, but it
scales far beyond anything currently available with DBMS
based permissions as opposed to directory permissions and
leaves "middleware" approaches dead in the water.