Forum OpenACS Development: ACS 4.x won't scale (I hope I am wrong)

Here is an Explain Plan from a customer site that has a measly 294 users and 52 groups.

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   TABLE ACCESS (BY USER ROWID) OF 'VIEW$'

**********************************************************************
**********

SELECT DECODE(COUNT(*),0,'f','t')
FROM
 ACS_OBJECT_PARTY_PRIVILEGE_MAP  WHERE OBJECT_ID = :b1  AND PARTY_ID 
= :b2
  AND PRIVILEGE = :b3


call     count       cpu    elapsed       disk      query    
current        rows
------- ------  -------- ---------- ---------- ---------- ----------
  ----------
Parse        1      0.02       0.03          0          0          
0           0
Execute     34      0.16       0.18          0          0          
0           0
Fetch       34     92.40      92.62          6   16864586        
680          34
------- ------  -------- ---------- ---------- ---------- ----------
  ----------
total       69     92.58      92.83          6   16864586        
680          34

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Despite the crappy formatting you will see that fetch query is a nice high number, and this query (run on almost every page) takes 92.83 seconds.

That view is a complete nightmare and hopefully it can be fixed.

I am looking into it as I write this but it is such an integral part of 4.x that I am afraid it could be a major design flaw.

Collapse
Posted by Don Baccus on
Yes, perms are a major design flaw.  That's been alluded to here in various posts, and Malte Sussdorf (one of my favorite ex-aD types) passed along some stern cautions regarding perms that included horrific performance numbers in regard to a bank project they worked on in Germany.

Even Philip says 4.x is incomplete and sucks, though of course he claims it is all the fault of the VCs and their lackeys, and that he bears no responsibility for this product.  I don't even want to go there...

Perms need a severe re-do, this is clear.  I've been thinking that this and the APM are two areas I would spend time on as the port progresses.  Not alone, necessarily, but its crucial enough so I'm going to want to have my thumbs stuck deep into it.

I don't think this needs to stand in the way of the porting effort, which after all is more than anything else about building a framework that can support multiple databases.  Yes, this seems a no-brainer but  after the Oracle uber alles attitude held by aD for so long it seems that a proof-of-concept goal must be utmost in our minds.

I guess I mean it's not going to be *worse* than the aD-delivered version when we get to the end of this first step.  After all, we'll be running their code.

I hope we'll have time to make it better before we declare victory (i.e. our first release), of course, but proving the feasibility of the multi-db approach is #1 on my priority list.  That's largely because I'm very confident we can fix these weaknesses once we've got the basic toolkit framework in place.  Again, hopefully that comes early enough in the process so we can redo the most gross aspects of 4.x before releasing.

And this is certainly one of them (the lack of meaningful admin stuff is another, I love the aD employee's comment in another thread about loading places and getting a list of 90,000 objects on his admin objects page!)

Collapse
Posted by Don Baccus on
BTW it sure looks like sequential table scans are being done here in order to calculate the joins...nested-loop style (which grow n^2, or more specifically l*r where l is the number of rows in the left table and r in the right table).

Obviously the key is to keep perm queries at an o(log2(n)) or o(log2(l*r)) or something along those lines if they're to be of any use.  o(r*log2(l)) style queries are ok (i.e. row-by-row matching  of r with rows selected by a btree index on l).

These views that are being used are very suspect...

Collapse
Posted by Jon Griffin on
I need to talk with Malte as I heard thru a source, that didn't want to be named, that Germany had some sort of solution that didn't get rolled back in to the 4.x series. I unfortunatly don't have his email.

Right now I have to have this fixed (or at least 1 order of magnitude less f'd up) by Monday morning, so you will get your fix earlier than you think. This is my paying gig so I can't put it off for a later revision.

This also means that I can't do as much on consolidating the acs-messaging et al until this is fixed. Maybe I will get lucky and Malte will have a partial fix that can hold my client over.

Collapse
Posted by Don Baccus on
This is more important than acs-messaging anyway, IMO.  We need a solution for this before we can release 4.x anyway, and if we were to learn that a redesign/reimplementation is needed then the earlier the better.

I hope that's not true...

Collapse
Posted by finn knudsen on
Hi Jon

Although I haven't played around much with ACS 4 much, I have spent the last 3 years working with Oracle in a bank, designing and implementing a risk management system. A big problem is things like which trade belongs to wich department, and departments are part of other departments etc.

I checked out some random bits I found at google ( just searcing for the name of the view actually turned up quite a few hits. )

http://www.google.com/search?q=cache:acs40.arsdigita.com/doc/sql/display-sql%3Furl%3Dacs-permissions-create.sql%26package_key%3Dacs-kernel+acs_privilege_descendant_map+&hl=da

From the small time I spend on it, it looks like that the view

create or replace view acs_privilege_descendant_map as select p1.privilege, p2.privilege as descendant from acs_privileges p1, acs_privileges p2 where p2.privilege in (select child_privilege from acs_privilege_hierarchy start with privilege = p1.privilege connect by prior child_privilege = privilege) or p2.privilege = p1.privilege;

is used alot, through the acs_object_grantee_priv_map. Is that also the case in the OpenACS distribution ? Anyway, on Oracle 7.3 which I was working on, views on views on views.. etc generally was very difficult to get working. The Oracle optimizer generally didn't do that great.

For what its worth I would suggest two possible solutions that could be explored.

1. Merge the views by hand. That is, take the view acs_object_party_privilege_map and copy paste the underlying views into this. Of course this makes the code less reusable but you don't have to break the structure of the OpenACS system. It is much easier to optimize this by hints, than on a series of dependent views.

2. The connect by view above, could also be part of the problem. A solution we used to speed performance was to create a table, with the contents of our department structure in. That way we could build indexes on the relevant keys.

Again, I haven't spend time on OpenACS 4 ( only 3.24 ) so the above may be nonsense, but I thought the problems described here where similar to some I had experienced earlier.

Collapse
Posted by Don Baccus on
I think you're on the right track, Finn.  Somewhere, I think in private e-mail, I mentioned to Jon that expanding the views and/or underlying function calls might be necessary for better performance.  I suspect the connect by is a problem, too, they're known to be expensive queries.

I also e-mail Jon Malte's e-mail address.

Collapse
Posted by Dan Wickstrom on
I've thought about this problem a little as the underlying acs_privilege_hierarchy table forms a convergent directed-graph instead of a tree structure, so I had to jump through some hoops to make this work in openacs. One thing I'm planning on doing when I get some time is to rewrite acs_privilege_descendant_map as a table that is populated by an insert trigger on acs_privilege_hierarchy. This might help some, but I would think that acs_privilege_hierarchy table is quite small, so I'm thinking there must be some other problems here besides acs_privilege_descendant_map view.

Probably, as was pointed out by others, the nesting of views is causing problems with optimizing the queries, though I'm not sure that postgresql has the same problem in this respect. It would to see how the two db's compare.

Collapse
Posted by Dan Wickstrom on
In the last line, I meant to say, "it would be interesting to see how the two db's compare".
Collapse
Posted by Andrew Piskorski on
FYI, Vadim Nasardinov, who wrote up some useful little notes on permissioning back c. ACS 4.0.1, tells me that the acs_object_context_index table used to cache the permissioning stuff, "grows polynomially with respect to the average number of descendants that an object has, and exponentially with respect to the depth of the context tree".
Collapse
Posted by Don Baccus on
Uhhh...that's not good.  I think we're in for a redesign, though Jon needs a quick hack to get out of his jam first.
Collapse
Posted by Tom Jackson on

It seems to me that the aD design goal for the permissioning system was to be able to reach into one table or view and pull out a yes or no answer: permission to read, or not, or 'Can object A perform action B on object C'. Essentially this is done by counting at least one row in acs_object_party_privilege_map. This view is a join of two other views (acs_object_grantee_priv_map and group_member_map) unioned with one of the views (acs_object_grantee_priv_map). acs_object_grantee_priv_map is a join of two other views, etc...

Permissioning is either a difficult problem, or one that isn't solved very well in ACS. Although many times aD points out the state of the art for certain problems, I haven't seen a real discussion on this one.

The other day I setup an OpenLDAP server to work with the Netscape Roaming profile schema. I had to set one permission record in the startup file. This one record serves to configure access for all clients!

I don't really know if ldap is an answer, certainly not a simple answer. I just wonder how permissioning is done so easily in this application, and so painfully in the RDBMS world.

Collapse
Posted by Henry Minsky on
I haven't had much chance to work with the acs 4 permission system,
but it seems like there ought to be ways to make it workable
without blindly flattening out the hierarchy and causing polynomial
or exponential sized tables.

In a general way, the permission system seems very nice; you can
assign any object a permission on any other object. When you go
to check a permission between object A and object B,  if no explicit permission relation is found, the system
will automatically search up both object's context chains until a permission is found. This is however potentially expensive, either
in time or space, depending on the implementation.

It may be possible to optimize this in various ways, but fundamentally, I think it would be better to look at permissions
as a heavyweight operation, not a lightweight one. Thus, you should
write your code accordingly; don't rely on the system to check
permissions of users and bboard messages, instead explicitly check
they user's group against the message's forum object, and so on.

I think of it like memory allocation in Java  or Lisp; just because the evaluator hides the cost of consing up a String, you should not
blindly do that when you can get away with appending to a StringBuffer, etc.

The way that ACS4 tried to sweep this stuff under the rug is indicative of the general problem there now in all their software
development - they have very smart but inexeperienced guys doing
the core architecture, so they don't seem to know how to avoid
the less obvious consequences of adding complex features. While
Philip may have been a little too extreme in the other direction,
leading to overly redundant code, at least the system was nice and
flat and easy to modify, without several layers of abstraction in
the programming language and then in the database.

I think it is becoming clear that it was a mistake to shove this stuff
into the database layer, but I think judicious use of the
permissions system makes sense, if you don't rely on it to
be efficient at a fine grain.

Collapse
Posted by Henry Minsky on
To follow up Tom Jackson's comment, the only information
that the system is implemented to answer efficiently is a yes/no question - so doing
a SELECT * from those views and getting combinatorial numbers of
rows is not really unexpected.  You are in effect asking for the relations of everything to everything. If you really do want to actually enumerate
a specific set of permissions (i.e., you really do want a list
of all users authorized to view object X), you ought to query the data model more
directly.
Collapse
Posted by Albert Langer on
Fine grained permissions within the DBMS *are* a difficult problem. The implementation is bad but enforcing permissions in the
application code has proved pretty horrendous too, so I wouldn't
like to lightly abandon shielding it from application developers.

I've been worrying about this since looking at the trees stuff in
the kernel and rushing off to review details of the various
methods for Oracle CONNECT BY, DB2 WITH, SQL92 RECURSIVE,
Celko's stuff and the ingenious Postgresql approach used
here:

https://openacs.org/forums/message-view?message_id=16799

http://www.arsdigita.com/books/sql/trees.html

Joe Celko, SQL for Smarties, in DBMS Online, March to June 1996
<http://www.dbmsmag.com/9603d06.html>
<http://www.dbmsmag.com/9604d06.html>
<http://www.dbmsmag.com/9605d06.html>
<http://www.dbmsmag.com/9606d06.html>

Graeme Birchall, DB2 UDB V6.1 SQL Cookbook,
http://ourworld.compuserve.com/homepages/Graeme_Birchall/DB2V70CK.PDF
via
http://ourworld.compuserve.com/homepages/Graeme_Birchall/HTM_COOK.HTM

In case a re-design is necessary I'd suggest carefully reviewing
the APIs now to ensure that work which has to go ahead on porting
before any such redesign won't have to be redone as a result of it
due to some omission of parameters in the API that could be provided
for, even though not really used now.

I haven't looked at OpenLDAP recently but last time I looked
the umich LDAP implementations, of which OpenLDAP is one, just
read a permission scheme into RAM from a config file at startup.
There's still no IETF standard for administering access controls
externally via LDAP.

Following won't help with Jon's problem for Monday, but I
believe the ActiveDirectory approach could be the ultimate
speedup if a redesign is necessary.

Each entry (acs_object) has an extra column for an opaque
binary "security_descriptor". This encodes (horrendously but
efficiently) a complex structured object which completely
spells out what groups and roles (or by exception individuals)
have exactly what permissions for specific sets of columns or
columns of that entry, *including* those inherited from the
context so there is no need to do any join at all. Instead
the *user* has a token (derived from a kerberos ticket)
which equally opaquely, efficiently and horrendously encodes
what groups and roles etc they have. The system then compares
the security token and description to filter the results obtained
from the underlying database (a simple JET B-tree ISAM similar
to BerkelyDB) or to veto changes etc. It might be possible to
do something equivalent with embedded java for Oracle and
embedded Python or C for PostgreSQL though it would be a major
undertaking and I can't see how it could be done for an RDBMS
that doesn't have a powerful procedural language for doing
complex calculations (though who knows - I'd have given up on
an instring implemented in pgsql 😉

This is designed for very fine grained permissions with the
Directory situation of large numbers of brief connections instead of the usual DBMS situation. It works much more efficiently than the
more conventional approach used in Novell Directories and I believe
they have also started extending it to MS SQLserver etc.

One of the attractions for Microsoft of course is that you can
only modify security descriptors (and create session access tokens)
through system APIs that are an effective barrier to competition,
even though the details are fully documented.

However I believe the approach also does have technical merit for
getting the highest possible performance. It trades off a relatively
small extra computation (plus significantly more complex design for
the code that actually does this) on each *access* to permissions,
for no join against data from any other entry on each read or write to an entry(row).

In a sense they are splitting the join into a view of groups and roles the user has, crystalized as a token for each *connected* user
and a view of group and role permissions that an object has,
crystalized as a security_descriptor for each row, with the join
criteria itself being a complex function (available in Postgresql and in Oracle) of these two crystalized views instead of a natural
join on some linking columns.

The performance tradeoff is just that for *changing* permissions from
a higher level context, the write trigger has to propagate it to
the entire sub-tree below - but that is very infrequent compared
to the continous permissions checking on every select and update and
the small extra calculations to copy the parent context's permissions
into the child row's security_descriptor in insert triggers.

This works especially well with the kerberos integration, which
includes a list of groups the user is a member of in the ticket
used to connect for creating a user token. (But of course the Kerberos server gets that list by querying the directory and a DBMS can do the same to get the list of groups and roles for a user).

An equivalent might be to perform a "login" by creating a token
with a list of groups for a user and inserting that into a user_session table (already present in ACS) before doing any
any reads or writes "as" that user.

Then each stored procedure API call for a read or write has
to include the userid of the user on whose behalf the call is
being made, and calls a sub-procedure which looks up this
stuff from the connected users table before doing whatever
else it has to do.

It would be overkill if there's a simpler solution, but it
scales far beyond anything currently available with DBMS
based permissions as opposed to directory permissions and
leaves "middleware" approaches dead in the water.

Collapse
Posted by David Eison on
I think some clarification is in order for the several "Permissions has problems" posts.  Specifically, several people have talked about design & redesign, and I *think* they mean they like the requirements and goals of the system, and just need it faster.  http://acs40.arsdigita.com/doc/acs-kernel/permissions/requirements.html vs. http://acs40.arsdigita.com/doc/acs-kernel/permissions/design.html

Looking at the requirements doc, one sees that the system answers more than "Can A do B?".  Answering "Can A do B?" seems to be where most permissioning systems stop; I like that this one makes it possible to answer other questions (from http://acs40.arsdigita.com/doc/acs-kernel/permissions/requirements.html ):

"Which parties may perform operation O on target T?"

"Which operations may party P perform on target T?"

"Upon which targets may party P perform operation O?"

These questions occur less often, and so thus could afford to be slower than the primary question, but they're still important and shouldn't be forgotten, and are awfully hard to answer without a table in the RDBMS to query against (which is also a requirement for writing queries that only return appropriate objects - basically any system that isn't in the RDBMS would suck because a lot of filtering of query result sets would be required).

ADSI doesn't seem to be an answer because it looks like LDAP - "Give me info about Bob, please" and doesn't seem to answer things like "Who can read my document?" or really be meant for controlling access to items on a finer scale than a bunch of items with a few groups each (as opposed to a zillion items and a zillion groups).  If I'm missing something, please point it out, I've only read their interface whitepaper (and fail to see what it does beyond replace LDAP).

Collapse
Posted by Tom Jackson on

ADSI is a specialized directory, whereas LDAP is very general. ADSI does less than LDAP. The book _Understanding and Deploying LDAP Directory Services_ , which I would recommend not buying, characterizes ADSI as a "NOS-based directory... developed specifically to serve the needs of a network operating system."

My question is whether the yes/no answer is really that slow. I am using it in an application with over a million objects and over 150K users. I'm not using groups, however, there is no special distinction anymore between users, groups, etc, they are all objects.

I actually create private groups by setting the context_id of an object to the user_id.

It should be pointed out that having a permissioning system, and using one are two different things. Developers should carefully decide how to control access to database objects.

As and example in ACS: access to the admin pages, and thus all the objects accessible there is controlled by only a few permission records.

Collapse
Posted by Yonatan Feldman on

The permissions system has several serious problems as has been pointed out in this thread.

I worked on tuning the permissions system when it was first released. The system (after some tuning) performed very well for the common cases, i.e. "Can user X perform action Y?" and was also able to answer other questions with very reasonable performance.

Since then, more functionality was added to the core, causing the permissions system to have some of these problems. This is our fault because we did not perform any more tuning and performance testing on the system after the changes.

I think both Finn and Henry have some good ideas for optimizations that can be performed right away to speed the system up.

First I would build all views on top of base tables instead of other views. Oracle will not optimize through views (I don't know about other databases). Once the views are rebuilt, I would tune each of them individually.

One thing I see often is people using the acs_permission.permission_p in where clauses. Oracle does not optimize this either (I also don't know about other databases). This is very bad; if you must have this in a where clause wrap the rest of the query in a view on the fly with this call on the outside (to minimize the number of rows you have to execute this on).

Henry is right, the permission system is not light weight. Developers should use it at a coarser level, e.g. permission forums not individual messages, etc.

I have seen the permission system, in an earlier stage, perform exceptionally, with some work it can again.

Collapse
Posted by Don Baccus on
Thanks for this post...

One thing I see often is people using the acs_permission.permission_p in where clauses. Oracle does not optimize this either (I also don't know about other databases). This is very bad; if you must have this in a where clause wrap the rest of the query in a view on the fly with this call on the outside (to minimize the number of rows you have to execute this on).
Postgres will not optimize these either.

Ironically, DanW and I have been discussing subqueries in from clauses, which are new for Postgres, just introduced in the current 7.1 version. I mentioned that I'd manage to greatly speed up a query in a personal project of mine that used PL/pgSQL functions by doing something similar to what you suggest - creating a "view on the fly" by selecting a slice of rows in a from-subquery, so the where clause calls the PL/pgSQL functions only for the relatively small number of rows that are returned.

So, yes, this technique can help.

Collapse
Posted by Bryan Quinn on
I have been following this thread with interest. I am just now starting to look at parties/permissions/groups with intent to specify requirements and design for the system. My plan is to identify a minimally useful set of requirements over the next two weeks. I will be talking with several developers here directly to understand the issues. Anyone who wants to put something on the table, please followup to this thread or contact me directly.

I appreciate the constructive suggestions being posted here, and to echo what Yon said, we are committed to making permissions scale. Overall, there seems to be support for the requirements and the API. If anyone thinks the requirements or API are missing something, please post the specifics. Likewise, if you think it is overengineered or does more than it should at the cost to other parts of the system.

The problem appears to be in the current ACS 4.2 implementation. If we can find common patterns that lead to improved Oracle and Postgres data models, I'll be ecstatic.

Collapse
Posted by Ben Adida on
I'm going to be taking a look at this permission issue over the next few days to determine what combination of steps we can take to make this significantly more scalable than it currently is. The comments above are very useful, and I'd like to ask people to contribute more ideas, more samples of problematic situations. Let's get all the data on the table and start thinking of a combination of:

    - data model updates (if necessary)

    - data model optimizations

    - query optimizations

    - usage guidelines

that will help us reach serious scalability.

Collapse
Posted by Jon Griffin on
I thought of a quick fix, but haven't had time to implement it yet.

The big slowdown is acs_permission.permission_p (which calls that ugly view). Since it is already a pl/sql proc it might make sense to simply query the base tables one at a time and if the select count(*) returns > 0 then there is no need to query the next table(s).

This was a quick thought so I may be missing something, but we have a procedural language there is really no need to do it all in sql.

Collapse
Posted by Don Baccus on
Try it!  Also ... Oracle may be smart enough to not count all rows if you're comparing with 0, but Postgres is not.  In a Postgres form of this query we really want to use "exists".  I've not bothered when porting the APM because it's not used all that often and the number of packages and versions will grow over time perhaps to the hundreds, but  not to the tens of thousands.

But for any check on a sizable table, "exists" is the way to go in Postgres.  It's considerably faster in the "true" case, for all practical cases the same in the "false" case as counting.

Collapse
Posted by Jon Griffin on
My co-worker, Matthew Avalos, rewrote the acs_permission.permission_p package.

Here is his email to me: ---

It looks like the permissions data model isn't completely dead. At least the question, "Does party A have permission P on object O?"

I rewrote acs_permission.permission_p and got rid of the unions. That didn't help at all performance wise, but it did allow me to rewrite the selects invidually, and after a while it was clear that only the first select was the hog. I rewrote the first one using oracle exist, and that fixed it. I guess the exist short circuts the huge selects like you would expect. I did the same for the other 4 selects in the union for good measure.

btw, I'm not sure I know what some of those selects are doing, especially the one that puts a 0 in the party column.

----

I put the file on http://jongriffin.com/static/openacs/permissions/ for public scrutiny.

If anyone else would like to check it out please feel free to do so. I will commit this into OpenACS in a few days to give it a little testing period.

Collapse
Posted by Jon Griffin on
I updated the file (which was missing a couple of things) and the performance went from 1:36:60 to 00:00:43.

That is scalable! I would still like some others to check it out.

Collapse
Posted by Andrew Piskorski on
Jon, that sounds like good work. Also, there was a conversation on web/db aseveral months back which concluded that "select count(*) from dual where exists (select 1 from ... )" should always be at least as fast as "select decode(count(*),0,0,1) from ...", not to mention more portable, so you probably want to replace that too.

FYI, here's what my acs_permission.permission_p (Oracle) looks like right now. As you can see, I have not ditched the acs_object_party_privilege_map view yet as you have, despite Michael B.'s comment in the source suggesting it...

  function permission_p (
    object_id	 acs_objects.object_id%TYPE,
    party_id	 parties.party_id%TYPE,
    privilege	 acs_privileges.privilege%TYPE
  ) return char
  as
    exists_p char(1);
  begin
    -- We should question whether we really want to use the
    -- acs_object_party_privilege_map since it unions the
    -- difference queries. UNION ALL would be more efficient.
    -- Also, we may want to test replacing the decode with
    --  select count(*) from dual where exists ...
    -- 1/12/2001, mbryzek

    -- I've gone ahead and replaced it the "select decode(count(*),0,0,1)
    -- from ..." with the superior "select count(*) from dual where exists
    -- (select 1 from ... )".  However, mbryzek's comments re. union
    -- vs. union all, above, are still quite cogent...
    -- --atp@arsdigita.com, 2001/04/29 05:29 EDT

    select decode(count(*), 0,'f', 't') into exists_p
    from dual where exists (select 1
      from acs_object_party_privilege_map
     where object_id = permission_p.object_id
       and party_id = permission_p.party_id
       and privilege = permission_p.privilege
    );
    return exists_p;
  end;
Collapse
Posted by Andrew Piskorski on
DOH!  Boy do I feel dumb.  Naturally, Jon's co-worker ALREADY replaced
all the yucky "select decode(count(*),0,0,1) from ..."s, and I somehow
did not see it.  Ok, chalk it up to no coffee yet this morning...
Collapse
Posted by Matthew Avalos on
Hmm, I tried the code that Andrew Piskoski posted just for kicks ... using "exists" on the acs_object_party_privilege_map.  Our version just did a "select decode( count(*),0,'f','t') from acs_object_party_privilate_map".  Adding "exits" didn't help.  Apperarently in Oracle the "exists" functionality doesn't get distributed into the view through unions and selects (recursively at least).  I'm sure it would have worked otherwise.

And  yea, decode doesn't buy you much when you are alreadly using a procedural language.

Collapse
Posted by Andrew Piskorski on
Mathew, yeah, that makes sense.  You'd only get better performance by
that alone if it was the count(*)'s rather than all the union joins
making things slow.

Since this thread clued me in to this potential problem, I went ahead
and changed my permission_p code to select from a new
acs_object_party_priv_map_all view rather than
acs_object_party_privilege_map, where acs_object_party_priv_map_all is
an identical view, except defined with UNION ALL rather than UNION.

I did no testing at all, but my ASSUMPTION is that this COULD give
similar performance to your solution, while being a somewhat simpler
change to stick in.  In reality, it almost certainly will NOT give as
good performance, as my acs_object_party_priv_map_all is still itself
selecting from various other views, and as Yon mentioned above, it is
known that Oracle is not smart enough to optimize such cases, and
Postgres probably isn't either.

As an aside, it would be very nice to be able to specify view
definitions and the like using a language like Tcl, as doing it
directly in sqlplus (for Oracle) often means either:  One, building
lots of views which themselves select from other views, which keeps
things nice and neat but causes performance problems.  Or two,
repeating the same blocks of text over, and over, and over again,
which leads to errors when someone goes and changes view A but forgets
to also change views B through Z to match.  Basicaly, directly writing
*.sql files and shoving them through sqlplus just doesn't make sense,
at least not when it comes to defining views.

Collapse
Posted by Jon Griffin on
I updated CVS with the newest version of permission_p.

This is in the acs-permissions-create.sql file.

I encourage everyone to update.

Collapse
Posted by Peter Vessenes on
Jon, you just saved our hineys. Can we send you something nice?
Collapse
Posted by Aram Kananov on
The *pure* guy who managed to kill *ACS* in perfomance terms was me.
I'm working in aD Munich. Also i'm oracle (since 1995), scalability
and perfomace fanatic.

The sad results which i managed to *achieve* with testing scalability was:
1) using functions in sql (which procces number of rewos > 1)
2) usning null as criteria in where caluse (oracle diesn't include
null in indexes)
3) lame design of acs_messages and bboard

The permission datamodel design wasn't showstopper. the problem was
usage of it.

WBR

Collapse
Posted by Peter Vessenes on
Jon, I've posted some hard data on how your permissions package performs here at the ars bboards.

Collapse
Posted by Allan Regenbaum on
Jon... we are implementing an enterprise sized webmail system for a client (3000+ users) on ACSclassic 4.2 .  As part of our testing we added (this far) 580 users, at which point everything ground to a halt as described by this thread.

We were excited to find, and implement your bugfix ...from /web/openacs-4/packages/acs-kernel/sql/oracle from the nightly-2001-07-03.tar.tgz
<<acs-permissions-create.sql>>

Unfortunately this has not helped , and each attempt to open mailboxes causes the subsequent calls to permission_p which result in VERY dismal performance.

I read that your environment became very usable after the bugfix... do you have any suggestions as to why we dont see the same improvement ?

we are running RH 6.2 with Oracle 8.1.7 on a PIII 700 with 1G Ram
with 580 registered users, it takes roughly 10 to 15 seconds per page request... regardless of which page, or module...

Thanks in advance

Collapse
Posted by Jon Griffin on
Run analyze on your tablespace and tell me what happens.
Collapse
Posted by Don Baccus on
In the thread started by Peter over in web/db Aram mentions that the permissions system performs well if you get rid of the PL/SQL calls and include the code as part of your query.

I've suspected this would be the case for some time.  While I hate the  notion of burying dependencies on the permissions datamodel into client code, we may have to bite the bullet and do as Aram suggests, at least for crucial permissions checks that are executed very frequently.

Collapse
Posted by Jon Griffin on
We have well over 500 users and about 100 subsites/groups. Everything ran great since I ran analyze.
Collapse
Posted by Tom Jackson on

I am working with an ACS4.x system that has over 200,000 users, 2 million acs_objects, 400K rows in acs_permissions, 6 million rows in acs_permissions_all, 13 million rows in acs_object_grantee_priv_map, and an apparently uncountable number of rows in acs_object_party_privilege_map (I gave up after 20 minutes).

I have no problems with permission_p slowing things down. Am I just lucky? Maybe the pages make too many calls to permission_p?

Collapse
Posted by Don Baccus on
This is comforting news from Jon and Tom.  Certainly I'm not interested in uglifying code that uses the permissions system by breaking the package abstraction if it is at all possible to avoid doing so.
Collapse
Posted by Jon Griffin on
I just confirmed that the best way to increase performance is running analyze on the associated tables.

This along with the patch makes things extremely fast.

Collapse
Posted by Allan Regenbaum on
I have a legacy acs4.x with 500 reg'd users, have run analyze, and the patch, ...webmail sped up considerably, but Wimpypoint still comes to a grinding halt whenever it hits permissions_p.

Seems webmail makes significantly less calls to permission than does Wimpypoint.

Any suggestions would be most helpful.

Jon ... how often do you run analyze?

Collapse
Posted by Jon Griffin on
Allan, I run analyze from a batch job weekly.
I put instructions at http://jongriffin.com/static/consultant/
Collapse
Posted by Ed Avis on
In the short term the best answer is just to avoid per-item permission checking where possible.  For example, you might be able to get away with doing a single permission check on the current package id and assuming that everything else will pick up the same permissions.  This suffices for relatively unimportant things like whether or not to show an [edit] link.  For actually doing the editing you will need to do the full permission check of course.  This has worked well for me.
Collapse
Posted by Dave Hwang on

I modified Jon's upgraded acs_permission.permission_p() code and got some significant performance improvements by making two changes. First, I ditched using DECODE and WHERE EXISTS, in favor of simply using COUNT() and using sub-selects instead of joins in the group related sections. Secondly, I changed the order of precedence in the permissions check to Public, direct, group, and group_rels.

create or replace package body acs_permission 
as 
  procedure grant_permission ( 
    object_id    acs_permissions.object_id%TYPE, 
    grantee_id   acs_permissions.grantee_id%TYPE, 
    privilege    acs_permissions.privilege%TYPE 
  ) 
  as 
  begin 
    insert into acs_permissions 
      (object_id, grantee_id, privilege) 
    values 
      (object_id, grantee_id, privilege); 
  exception 
    when dup_val_on_index then 
      return; 
  end grant_permission; 
  -- 
  procedure revoke_permission ( 
    object_id    acs_permissions.object_id%TYPE, 
    grantee_id   acs_permissions.grantee_id%TYPE, 
    privilege    acs_permissions.privilege%TYPE 
  ) 
  as 
  begin 
    delete from acs_permissions 
    where object_id = revoke_permission.object_id 
    and grantee_id = revoke_permission.grantee_id 
    and privilege = revoke_permission.privilege; 
  end revoke_permission; 
  -- 
  function permission_p ( 
    object_id    acs_objects.object_id%TYPE, 
    party_id     parties.party_id%TYPE, 
    privilege    acs_privileges.privilege%TYPE 
  ) return char 
  as 
    v_count     INTEGER; 
  begin 
    -- Check public permissions 
    SELECT COUNT(*) INTO v_count 
    FROM   acs_object_grantee_priv_map 
    WHERE object_id = permission_p.object_id 
    AND   privilege = permission_p.privilege 
    AND   grantee_id = -1; 
    if v_count > 0 then 
        return 't'; 
    end if; 
    -- 
    -- Check direct permissions 
    SELECT COUNT(*) INTO v_count 
    FROM   acs_object_grantee_priv_map 
    WHERE object_id = permission_p.object_id 
    AND   privilege = permission_p.privilege 
    AND   grantee_id = permission_p.party_id; 
    if v_count > 0 then 
        return 't'; 
    end if; 
    -- 
    -- Check group permmissions 
    SELECT COUNT(*) INTO v_count
    FROM acs_object_grantee_priv_map 
    WHERE object_id = permission_p.object_id 
    AND   privilege = permission_p.privilege 
    AND   grantee_id IN (SELECT DISTINCT group_id 
                         FROM   group_approved_member_map 
                         WHERE  member_id = permission_p.party_id); 
    if v_count > 0 then 
        return 't'; 
    end if; 
    -- 
    -- relational segment approved group 
    SELECT COUNT(*) INTO v_count 
    FROM   acs_object_grantee_priv_map 
    WHERE object_id = permission_p.object_id 
    AND   privilege = permission_p.privilege 
    AND   grantee_id IN (SELECT DISTINCT segment_id 
                         FROM   rel_seg_approved_member_map 
                         WHERE  member_id = permission_p.party_id); 
    if v_count > 0 then 
        return 't'; 
    end if; 
    -- 
    return 'f'; 
  end; 
  -- 
end acs_permission; 
/ 
show errors 

The big bottleneck was the performance of the group and group-rels related permission checks, which is now much better. Everything seems to work great now, but am I missing something?

Collapse
Posted by Don Baccus on
I'm curious - did you ANALYZE the tables before running your timings of two versions (JOIN vs. IN) of the group queries?  Did you take a look at the output of EXPLAIN?

I have a lot more experience with PG and know that in that RDBMS JOINS will normally be a *lot* faster than IN.  There's been a lot of work done to optimize joins and none to optimize the kind of IN expression you've written here.

I'd like to get a handle on when it's better to do JOIN vs. IN in Oracle.  Having the Oracle version of a PL/SQL function perform queries differently than a PL/pgSQL function is OK (if properly documented so a future hacker doesn't come along and realign them for aesthetic reasons).

But things like this sure wreak havoc with our efforts to share SQL92-compliant queries unchanged in Query Files ...

Collapse
Posted by Andrew Piskorski on
Dave, just how many users and groups did you have in your system when
doing your test?  I have trouble believing that a simple "count(*)"
could have superior performance to "select count(*) from dual where
exists (select 1 ...)".
Collapse
Posted by Dave Hwang on

I've posted a trace file on three different queries at http://www.goliath.org/acs/trace-permissions.txt. The three queries you'll see are:

  1. group permission check using IN (ie. my version) (232 rows fetched)
  2. group permission check using JOIN (ie. Jon's version) (335512 rows fetched)
  3. group permission check using JOIN, without specifying privilege (ie. multirow select of all privileges of :grantee_id on :object_id) (561 rows fetched)

An interesting result is that query #3 is much faster than #2, which leads me to believe the real problem isn't with JOIN vs IN, but is with the way that "privilege" is indexed (or not). That's why I didn't include the COUNT(*) vs SELECT 1 FROM DUAL wrapper in the queries, because it doesn't seem relevant to the performance issue.

However, if someone could take a look at the trace file and help explain the results, I'd appreciate it, as I just learned how to do this since Don asked for it. =) Looking at the data model, I can't figure out where the problem is. It looks like privilege should an indexed column of acs_privileges, but since acs_object_grantee_priv_map is a view of a view, I can't really tell what's going on with privilege.

Collapse
Posted by Dave Hwang on
To answer Andrew's question: I ran my test on a system with 64 users, 38 groups, 2455 acs_objects, and 462 rows in table "acs_permissions".
Collapse
Posted by Andrew Piskorski on
That's a pretty small amount of data. Things might be different with more.

Hm, well, the trace for you're Query 2 has a "MERGE JOIN CARTESIAN" (167707 rows) in it. That sure seems suspicious... I just tried a similar query:

select decode( count(*), 0,'f', 't')
from dual where exists (
    select 'x'
      from acs_object_grantee_priv_map ogpm,
           rel_seg_approved_member_map rsmm
     where object_id = 5204
       and rsmm.member_id = 3305
       and privilege = 'admin'
       and ogpm.grantee_id = rsmm.segment_id
);
on my patched ACS 4.2 Dev site, and I did not get a cartesian join. When I run tkprof on the trace file, I see only 4 query rows, vs. 335512 rows on your system.

Now, I have even less data in my Dev system then you do, I don't have BBoard installed, and my users, groups, and permissions are different from yours, so none of this is conclusive. That merge join you're getting has got to be symptomatic of some problem, though... But whether it's something weird on your site, or a problem common to OpenACS, I don't know.

You're sure your tables are analyzed? If not, try:

begin dbms_stats.gather_schema_stats (:db_user, :percent, cascade => true); end;
(A percent of 20 or so should be good enough.)

Also, what sort of performance difference are you seeing, anyway, time-wise?

Collapse
Posted by Dave Hwang on

Andrew,

I'm running the ACS 4.2 tarball from the ACS Repository, not OpenACS. To verify, I did a diff on my acs-permissions-create.sql and the one from the stock tarball, and they were identical.

I don't really understand how the permissions via relational segments is supposed to work. I'm not sure how to get rel_seg_approved_member_map.segment_id to get inserted as acs_object_grantee_priv_map.grantee_id so I can't run a similar query to what you have. Everything I do to try to create a relational segment just ends up populating group_approved_member_map instead.

I haven't run analyze or dbms_stats yet. I need to learn how to do that next. In terms of performance, the difference between IN and JOIN is over a second on my system, per call to acs_permission.permission_p. For a bboard thread of 10 messages, it takes over 10 seconds to return the page. Like you said, I don't have much data yet, but my site had gotten progressively worse as what little load I had increased.

I know the cartesian join is the problem. The data model is kind of hairy, but I think the problem lies somewhere in the fact that objects, privileges and grantees (ie. parties) all have a heirarchical structure. We already have to compare a privilege and all its super-privileges to an object and all the contexts that contain it, and when we throw groups into the mix (grantee and all its group memberships) using JOIN, we end up with a cartesian join, but by using IN, we avoid it by explicitly splitting the query in a way that the optimizer is not choosing on its own. That's just my guess, anyway.

Collapse
Posted by Jon Griffin on
If you haven't run ANALYZE your results are meaningless.
See http://jongriffin.com/static/consultant/analyze for a script.
Collapse
Posted by Don Baccus on
Ditto Jon's answer and I want to emphasize it is equally true for PostgreSQL (VACUUM ANALYZE).

The only performance analysis that will help us measure scalability is that performed on installations with meaningful amounts of data (I think you pass this test) with statistics properly gathered as will be done for a production site.

Both DBs will generate horrible plans if tables aren't analyzed.  Both will do so occassionally even if they *are* properly analyzed but not nearly as often.

Collapse
Posted by Dave Hwang on

Jon, thanks for the script. I ran autoanalyze.sql and then re-ran the trace. The new results are available at http://www.goliath.org/acs/trace-permissions-after-analyze.txt (old results). Here is a summary of the results:

   Query                    Before ANALYZE     After Analyze
==========================  =================  ==============
1. using IN                 query 232 rows     query 949 rows
2. using JOIN               query 335512 rows  query 8207 rows
3. using JOIN, not          query 561 rows     query 311 rows
   specifying priv
4. using IN and SELECT 'x'      n/a            query 950 rows
   instead of COUNT(*)

So I see that query 1 using IN got worse, and that queries 2 and 3 using JOIN got much better. However, there is still an order of magnitude difference between using IN and using JOIN (949 vs 8207).

Collapse
Posted by Don Baccus on
The fact that fewer rows are processed doesn't necessarily mean the query runs faster.  Note that "explain" shows that the "in" form requires two sorts and a hash join, while the "join" form requires one sort and a merge join.  hash joins can be expensive as the hash tables have to be built before the join can commence.  Then again merge joins require the operands be sorted (which is why one is being explicitly sorted).

Both of these plans look like good ones to me.  As to which runs faster you either need to find a slower computer (since they both report "CPU 0.00") or insert a lot more data ...

Collapse
Posted by Andrew Piskorski on
Jon, why do you recommend that analyze script for Oracle rather than
simply using dbms_stats.gather_schema_stats() ?
Collapse
Posted by Jon Griffin on
Strictly because it is easy to set up in a cron job, and only analyzes what needs to be analyzed. Plus it emails me to remind me it was done.

Either way works.

Collapse
Posted by Andrew Grumet on
Okay -- dumb question.  I'm looking over permission_p distributed with the latest OACS and I can't figure out why the function doesn't begin with a query that checks for direct permission in the acs_permission table.  Am I missing something?
Collapse
Posted by Arjun Sanyal on
Reviving an old (but good) thread...

Andrew: I'm wondering the same thing i.e. why does the permission_p
function check the group and rel segment perms before the direct perms. Shouldn't the checks (from first to last: group, relseg, direct, public, public-like) be ordered from most common to least common?

Collapse
Posted by Don Baccus on
Hmmm...someone should study a real live installation and see which check bears fruit most often.

Direct perms to a user?  I'd think that "registered users can read" is more common and that's probably why the group check's there first?

But I don't know ... it's all guesswork without data.

Collapse
Posted by Andrei Popov on

Reviving the dead again -- in calendar code, query get_viewable_calendar called in /calendar/admin/calendar-preferences is a killer (aside from a strange state in which calendar code is, as there are some unported queries, etc.). It uses calendar__readable_p finction that is basically this:

CREATE FUNCTION calendar__readable_p(
        integer,
        integer
)
RETURNS boolean
AS 'declare
        readable_p__calendar_id         alias for $1;
        readable_p__party_id            alias for $1;
        v_readable_p                    boolean;

    begin
        select (case count(*)
                when 1 then true
                        else false
                ) into v_readable_p
        from    acs_object_party_privilege_map
                where   party_id = readable_p__party_id
                and     object_id = readable_p__calendar_id
                and     privilege = ''calendar_read'';

        return v_readable_p;
end;'
LANGUAGE 'plpgsql';

First of all, readable_p__party_id seems better be aliased to $2, not $1. case must be re-written, as this syntax is incorrect. After my wimpy changes it looks thusly:

CREATE FUNCTION calendar__readable_p (
	integer,
	integer
)
RETURNS boolean AS '
DECLARE
	readable_p__calendar_id		alias for $1;
        readable_p__party_id            alias for $2;
	v_readable_p			boolean;
BEGIN

        /*
         *
         * Using acs_object_party_privilege_map is EVIL -- it takes AGES!
         * We only need objects that are labeled as cal_item, hence...
         *
         */

        select case 
                when count(*) = 1 then ''t''
                else ''f''
               end into v_readable_p
        from dual where exists (
                select  1
                from    acs_object_party_privilege_map 
                where   party_id = readable_p__party_id
                        and object_id = readable_p__calendar_id 
                        and privilege = ''calendar_read''
        );

	RETURN v_readable_p;
END;
' LANGUAGE 'plpgsql';

...and kills OACS outright. A simple select calendar__readable_p(o.object_id, 4683) as readable_p from acs_objects o; (where 4683 is my id) has been running for some 10-20 minutes already and there are only 400 or so objects in acs_objects. FWIW, analyze has just been run...

I'l try to re-write acs_object_party_privilege_map to not use a cascade of UNIONs on views on views on views, and see whether this helps...

Collapse
Posted by Don Baccus on
Try switching it to all_party_privilege_map.  Typically this has helped a lot and I've been wiping out acs_party_privilege_map usage as I find time to do so.

This view's usually fast in Oracle, too.

It also uses a union but it doesn't explode exponentially the way the union in acs_party_privilege_map does.

Anyway ... I'd appreciate it if you'd give it a try and let us know if it speeds things up or not.  If so, I'd more than welcome a patch submission to the SDM to both fix the bugs you've pointed out plus the speed fix, and will put it into the 4.5 tree.

If this doesn't speed things up significantly I'll think harder :)

Collapse
Posted by Andrei Popov on
Yeah, all_objects_party_privelege_map sped things up quite a bit... I guess that Oracle view uses CONNECT BY while in Postgres we had to fall to a work-around (however ellegant) with tree_right() that is probably not as efficient as internal implementation of CONNECT BY?... May it make sense to try and make tree_right() a C function instead of PL/PGSQL one?
Collapse
Posted by Don Baccus on
No, it's not the tree query, it's the UNION.  The acs_party_privilege_map UNIONs up the results of SELECTs that return a huge number of rows.  Those selects aren't qualified in any way, they're returning a huge combination of various tables in the system.

You use this view in queries that do qualify the result, of course.  Often you're just testing a single object, for instance.

But Postgres first builds the huge UNION - millions of rows, frequently - then performs the rest of the query on that result.

You can see that it's going to be slow because the resultset size grows in proportion to the product of the number of rows in various tables.

I've read some Oracle documentation that indicates that Oracle's no smarter for queries like this.

The replacement all_party_privilege_map - added by aD because of the horrible performance of acs_party_privilege_map, BTW - still contains a UNION but as I mentioned above, it is less evil.  It only grows linearly in proportion to the rows in the parties and relations tables.

Linear growth we can handle, N^2 or N^3 we can't.

Do you have time to bundle up a patch with your change and bug fixes?  If so I'll get it into the 4.5 release ...

Collapse
Posted by Don Baccus on
OK, I've misnamed the views involved but y'all get the idea, right? :)
Collapse
Posted by Jay Dubanik on
Finally I found this thread, I was already waried that there is something wrong with my server.

Somebody here mentioned that we should call ad_permission_p proc wisely.

I have noticed that this proc is called for every little object (image) you add to your adp page.
So if you have 10 little thumbnails added to your index.adp page( and keep them in /image directory) then ad_permission_p is called for every one of them, do you like or not.

Question: How do you guys avoid it? I don't want to check for permissions on every little image.

Collapse
Posted by Don Baccus on
4.6 will cache the permission check (strange, I thought I'd checked that in but apparently haven't, I will later today though).  This should help considerably.

We've also talked about specifying a special directory for static files that don't need to be checked, but that won't happen until 4.7.

Collapse
Posted by Jay Dubanik on
Hi Don,

Thats exactly what I did, I have specified one directory /image that will never be checked for permissions.

I have immediately gone from 16 ad_permission_p calls to 1
and total reqest time for the page went from 51 seconds to 3.9.

Of course 3.9 is still too big but, thats for later.

janus

Collapse
Posted by Jon Griffin on
Something else is wrong. You need to vacuum your db or something.