Forum OpenACS Development: permission_p revisited
I'd like to revive that discussion since I'm tuning permission_p with real data (87,000 objects, many are 5 generations deep etc etc).
Right now permission_p is a series of 5 queries. The proc returns when it hits its first 't' or else returns 'f' after running all 5. Here is a quick summary with an indication of relative cost:
- Group permission: expensive
- Rel segment permission: expensive
- Direct permission: cheap
- Public permission: expensive
- Public-like permission: cheap
These are the average number of seconds spent on each query over about 24 calls.GROUP_ REL_SEG DIRECT PUBIC PUBLIC_LIKE ---------- ---------- ---------- ---------- ----------- .281416667 .200166667 .003875 .272708333 .002166667
Moving on...what can we do to make things better? One option is to go after the individual queries. This may yet bear some fruit.
[Aside: One thing I don't grok is the join with the users table in the "public" query. When a non-registered member of the public makes a request, the auth system will return 0 as their user_id. But the users table doesn't have an entry for user_id=0, so this query really is checking for registered users, right? But if that's the case, why is it called "public" and why is the check for grantee_id=-1 (the public) ?]
Another thing we can do is re-arrange the queries for maximum efficiency. Since the public-like query is cheap and will very often return 't' (most sites have a lot of public pages), there may be a general benefit from putting it first. We might as well also put the other cheap query---direct permissions---near the top. After that the results would probably be mixed.
A third approach would be to either add extra functions or parametrize permission_p to give developers more control. If for example, I am willing to say that a particular permission should never be granted to the public, can I use that information to my benefit by calling a different function or skipping some of the queries?
I wanted to put these out ideas out here because I know there has been a lot of good thinking and this may not be reflected in CVS. I hope to generate discussion so we can recapture the good thinking, make some calls and get the changes in.
We have a big database of migrated content so I'm happy to do some trial and error and report back here.
I've found that acs_object.name and site_node.url are slower than permission_p in many cases.
the where clause, we're just getting a list of id's and
permission_p for each one. Example: we display on the
homepage a list of classes I am taking. We want to display a
little "admin" link to the administration page for each clas I am a
professor for. So we compute whether or not I have admin
privilege for each class. This can can maybe half a second per
row, bogging things down if I happen to be a professor for a
number of classes.
How long does an average permission check take on your
have time to try to figure out which one was correct.
the ordering we chose was direct (very fast), then public, then the rest.
Yes, joining with this as part of your query that selects classes may well be faster. It's worth a shot and since you're in measuring mode already, Andrew, you're a good candidate to take that shot :)
If you have time and want to report back it would add to our knowledge of how to work around permissions performance issues, regardless of the experiment being a failure or success.
As far as reordering goes ... you have some good, solid data and it sounds like OpenForce is getting some, too. Jon Griffin and I both played around with ordering many moons ago and settled on what you see now. I find it interesting that OpenForce appears to be discovering that the optimal order may differ between PG and Oracle.
<p>I don't have an answer for your question about the join with the users table in the public query. It looks very strange, doesn't it?
Are you using the Java or the C version of Tom Kyte's code?
I'm tied up with other stuff today, but I'll try to hit this again over the weekend.
If someone wants to look into which one is correct or what is going on that would be great.
Nevertheless there is probably still room to improve permission_p:
-- rip out "public" check or figure out why it's there
-- move fast checks to top
Don, how do you suggest we proceed?
The pg code and oracle code compute the same results, the former with one block and the latter with two.IF (permission is granted to "the public") AND ( (person is logged in) OR (person is not logged in) ) THEN return "true"
This is moderately confusing because we have an odd mix of data model logic and application logic. The auth system uses user_id=0 to designate someone who is not logged in*. The data model uses party_id=-1 to designate "the public" which includes everyone, logged-in or not.
Whether "non-logged-in person" and "the public" are semantically equivalent is an interesting debate but besides the point. Anyone with a rudimentary understanding of logic will quickly see that "a AND (b OR (not b))" is just "a". Thus we can elminate this part of the logic from both incarnations.
The only other significant difference between oracle and pg versions is ordering. Pg puts the fast stuff up front
*this despite the fact that select count(*) from users where user_id=0 returns no rows
- the party_id argument represents something other than a user
- the application calling permission_p assumes that only user parties should get permission via grants to "the_public"; groups and other types of parties should not get permission
Regarding the impact of such a change, though we often *grant* permission to non-user parties, I don't know of any cases where we *query* whether these types of parties have permission. Does anybody else?