Forum OpenACS Development: Re: OpenACS Permissions System - Performance Improvement Work

Don, Jeff,

Thanks for the replies. Before reading them I had already done some more work on this so what you see below was done before responding to your posts.

I knew that you would need to see the query plans in full, and initially when I drafted the first post I had them in, but it was all a bit too much at once! :-) I thought I should narrow the choices a little first. You’ll find them below.

Firstly, to cover Jeff’s specific comments and questions:

- How many users and objects? Around 15000 objects, only about 7 discreet users but some groups and a hierarchical set of compositions of the above. It is a test system.

- When you say that you get different results do you mean in absolute terms (execution time) or in relative terms? I am working with PostgreSQL 7.4.2. I accept that database tuning can make enormous differences to execution times, as can summary tables, de-normalisations and differences in query planning schemes, but in the end one factor is how hard the problem is that the DB is being asked to solve. I have done no specific tuning work on this DB beyond increasing the shared memory buffers and other ‘standard’ openacs things. I have not run analyse very much to take advantage of stats – but then on a test system that is less useful since there are not hordes of people trying to prise the data out at the same time. :-)

- I suggested the additional index on acs_permissions primarily with an eye on the possibility of guiding the planner to go to the index rather than to the data pages. It was an afterthought really. I remember that putting indices on columns with a small number of discreet values is often a bad idea. Perhaps that suggestion is only half baked!

- I don’t know anything about partial indices. I can guess what they are but have never seen a live one, and I am sure that regular running of analyse would speed things up. I am testing relative times here since I don’t know of a better way to compare options.

- I am not really sure that I fully understand your reasoning in relation to the planner settings. The default value for the geqo_threshold is 12 and none of the queries refer to more than 4 tables. For that reason I believe that these query plans are all fully deterministically produced, so changing the parameter that determines when the planner goes to genetic planning seems illogical. Even the from_collapse_limit (default value 8) will not be reached by our test queries.

- One final point, one of the reasons for looking at the new query structure as an option is that it may reduce the number of possible execution plans that the planner must explore. This article describes the process although the example is not identical in structure http://jamesthornton.com/postgres/7.1/postgres/explicit-joins.html

Ok, here is the work that I did yesterday:

#######
STAGE 3
#######

Here are the benchmarks for three variants of the recommended query structure against the acs_object_party_privilege_map view. The first variant of the query is as presented in the original benchmark TEST A-1).

Original Test:

explain analyse select 1 from acs_object_party_privilege_map ppm where ppm.party_id=-1 and ppm.privilege='admin' and ppm.object_id=16631;

Total runtime: 223.259 ms

i) explain analyse select exists(select 1 from acs_object_party_privilege_map ppm where ppm.party_id=-1 and ppm.privilege='admin' and ppm.object_id=16631);

Total runtime: 501.860 ms

ii) explain analyse select 1 where exists(select 1 from acs_object_party_privilege_map ppm where ppm.party_id=-1 and ppm.privilege='admin' and ppm.object_id=16631);

Total runtime: 501.337 ms

iii) explain analyse select 1 from dual where exists(select 1 from acs_object_party_privilege_map ppm where ppm.party_id=-1 and ppm.privilege='admin' and ppm.object_id=16631);

Total runtime: 501.718 ms

The obvious conclusion here is that with the optimiser in PostgreSQL 7.4.2. there is no advantage in selecting from dual, or using exists with this query structure. In fact, the use of exists actually doubles the execution time. Hopefully this collection of tests will provide a satisfactory baseline from which we can work.

Remember that TEST B-5 took 1.186ms to produce the same result. My personal view is that this represents an improvement (unless I have made some mistake(s) somewhere!).

I fully understand the issues with changing the existing method and data model, however if we can in fact realise a significant performance gain I would think that it would be worth phasing in a change over time.

I also understand your concern about the consequences of abstracting this into the function acs_permission__permission_p however developers must surely retain responsibility for understanding the performance consequences of various query structures and selecting accordingly. In any event we should perhaps test this rather than assume that there will be a problem (maybe the query planner does a better job now?). If we remain concerned about existing code then we could retain the view for driving acs_permission__permission_p calls for legacy code.

In any event, I agree with your point 3b, nesting a permission call in such a way that it is called for every row in the outer loop is asking for trouble. That is another reason why I think this other structure will work well because it allows us to guide the planner a little more firmly to the execution plan that we know will work well. It reduces the planner’s options!

To re-iterate - the biggest advantages of using the table I have created are (I think):

1) The table is structured so that a simple query returns the party_id and all its ancestor_ids with no duplicates.

2) This is achieved (I think) with reference to the index.

3) The de-duplication occurs when the rows are inserted, which only happens when group memberships are changed. This ensures that the work is done once only and it takes place during a non time critical activity.

4) There are no unicode enabled text fields to slow things down.

The advantages of the alternative query structure are (I think):

1) It avoids ever having to de-normalise portions of the underlying triple hierarchy into a single dataset.

2) Instead it extracts the relevant rows from each hierarchy into three duplicate free datasets (which will tend to be a small number of rows) and then only has to join the acs_permissions table to those three small datasets.

For completenes, this is what happens if I make a view that has only the two integer key fields as per rh_group_struct thereby avoiding the need to maintain the table with triggers.

Create the view:

create view rh_group_struct_view (group_id, element_id) as select distinct element_id as group_id, element_id from group_element_index union select group_id, element_id from group_element_index;

\d rh_group_struct_view

View "public.rh_group_struct_view"
   Column   |  Type   | Modifiers 
------------+---------+-----------
 group_id   | integer | 
 element_id | integer | 
View definition:
 SELECT DISTINCT group_element_index.element_id AS group_id, group_element_index.element_id
   FROM group_element_index
  ORDER BY group_element_index.element_id, group_element_index.element_id
UNION 
 SELECT group_element_index.group_id, group_element_index.element_id
   FROM group_element_index;
Here is a re-run of the tests for the alternative query structure using rh_group_struct_view instead of the rh_group_struct table with indices:

TEST A-6

explain analyse select 1 from acs_permissions where 
privilege in (select descendant from acs_privilege_descendant_map where privilege='admin') 
and object_id in (select ancestor_id from acs_object_context_index where object_id=16631) 
and grantee_id in (select group_id from rh_group_struct_view where element_id=448) 
limit 1;

Total runtime: 16.717 ms
TEST B-6

explain analyse select 1 from acs_permissions where privilege in (select descendant from acs_privilege_descendant_map where privilege='admin') and object_id in (select ancestor_id from acs_object_context_index where object_id=16631) and grantee_id in (select group_id from rh_group_struct_view where element_id=-1) limit 1;

Total runtime: 236.622 ms

So, that is significanty worse in both cases! My conclusion here is that (subject to tuning wizardry beyond my knowledge!) it would be better if we avoid using a view. The reason given for not fully denormalising the triple hierarchy was resource usage, so we were left with little choice but to use a view. But if now the query planner can cope with queries of the structure that I have tested here, then perhaps it would be wise to take advantage of that improvement since then we would not need to de-normalise the whole triple hierarchy anyway. I agree that we would need to test the effects of an abstraction into permission::permission_p and usage in both select and where clauses (perhaps the PostgreSQL team have sorted this planning issue as well?). I wasn't aware of the original constraints that you were working around and so these suggestions may be borne of 'constructive naivety', but maybe the constraints are no longer there.:-)

For completeness here are the complete execution plans for the original benchmark query, the recommended query using 'exists' and finally the fastest of the alternative queries for you to see:

Original benchmark query:
=========================

explain analyse select 1 from acs_object_party_privilege_map ppm where ppm.party_id=-1 and ppm.privilege='admin' and ppm.object_id=16631;
QUERY PLAN                                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=51.32..268.34 rows=8 width=0) (actual time=219.645..219.645 rows=0 loops=1)
   ->  Hash Join  (cost=51.32..220.65 rows=6 width=4) (actual time=219.640..219.640 rows=0 loops=1)
         Hash Cond: ("outer".grantee_id = "inner".party_id)
         ->  Nested Loop  (cost=0.00..168.84 rows=57 width=8) (actual time=0.336..193.543 rows=10781 loops=1)
               ->  Seq Scan on acs_privilege_descendant_map pdm  (cost=0.00..2.05 rows=1 width=68) (actual time=0.197..0.311 rows=1 loops=1)
                     Filter: ((descendant)::text = admin'::text)
               ->  Index Scan using acs_permissions_privilege_idx on acs_permissions p  (cost=0.00..166.08 rows=57 width=76) (actual time=0.124..137.987 rows=10781 loops=1)
                     Index Cond: (("outer".privilege)::text = (p.privilege)::text)
         ->  Hash  (cost=51.27..51.27 rows=18 width=4) (actual time=0.096..0.096 rows=0 loops=1)
               ->  Index Scan using party_member_member_idx on party_approved_member_map pamm  (cost=0.00..51.27 rows=18 width=4) (actual time=0.080..0.086 rows=1 loops=1)
                     Index Cond: (member_id = -1)
   ->  Index Scan using acs_object_context_index_pk on acs_object_context_index c  (cost=0.00..7.93 rows=1 width=4) (never executed)
         Index Cond: ((c.object_id = 16631) AND (c.ancestor_id = "outer".object_id))
 Total runtime: 219.995 ms
(14 rows)

The recommended query:

======================

explain analyse select exists(select 1 from acs_object_party_privilege_map ppm 
where ppm.party_id=-1 and ppm.privilege='admin' and ppm.object_id=16631);
QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=554.75..554.76 rows=1 width=0) (actual time=499.768..499.770 rows=1 loops=1)
   InitPlan
     ->  Nested Loop  (cost=0.00..554.75 rows=8 width=0) (actual time=499.757..499.757 rows=0 loops=1)
           ->  Nested Loop  (cost=0.00..507.05 rows=6 width=4) (actual time=499.753..499.753 rows=0 loops=1)
                 ->  Nested Loop  (cost=0.00..168.84 rows=57 width=8) (actual time=0.359..207.760 rows=10781 loops=1)
                       ->  Seq Scan on acs_privilege_descendant_map pdm  (cost=0.00..2.05 rows=1 width=68) (actual time=0.206..0.322 rows=1 loops=1)
                             Filter: ((descendant)::text = 'admin'::text)
                       ->  Index Scan using acs_permissions_privilege_idx on acs_permissions p  (cost=0.00..166.08 rows=57 width=76) (actual time=0.136..144.241 rows=10781 loops=1)
                             Index Cond: (("outer".privilege)::text = (p.privilege)::text)
                 ->  Index Scan using party_approved_member_map_pk on party_approved_member_map pamm  (cost=0.00..5.92 rows=1 width=4) (actual time=0.021..0.021 rows=0 loops=10781)
                       Index Cond: ((pamm.party_id = "outer".grantee_id) AND (pamm.member_id = -1))
           ->  Index Scan using acs_object_context_index_pk on acs_object_context_index c  (cost=0.00..7.93 rows=1 width=4) (never executed)
                 Index Cond: ((c.object_id = 16631) AND (c.ancestor_id = "outer".object_id))
 Total runtime: 500.130 ms
(14 rows)

The suggested alternative query

===============================

explain analyse select 1 from acs_permissions where privilege in (select descendant from acs_privilege_descendant_map where privilege='admin') and object_id in (select ancestor_id from acs_object_context_index where object_id=16631) and grantee_id in (select group_id from rh_group_struct where element_id=-1) limit 1;

QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3.58..174.65 rows=1 width=0) (actual time=0.811..0.811 rows=0 loops=1)
   ->  Nested Loop IN Join  (cost=3.58..174.65 rows=1 width=0) (actual time=0.806..0.806 rows=0 loops=1)
         ->  Hash IN Join  (cost=3.58..170.67 rows=1 width=4) (actual time=0.802..0.802 rows=0 loops=1)
               Hash Cond: (("outer".privilege)::text = ("inner".descendant)::text)
               ->  Nested Loop  (cost=1.53..168.32 rows=57 width=72) (actual time=0.121..0.121 rows=0 loops=1)
                     ->  HashAggregate  (cost=1.53..1.53 rows=1 width=4) (actual time=0.119..0.119 rows=0 loops=1)
                           ->  Seq Scan on rh_group_struct  (cost=0.00..1.52 rows=1 width=4) (actual time=0.114..0.114 rows=0 loops=1)
                                 Filter: (element_id = -1)
                     ->  Index Scan using acs_permissions_grantee_idx on acs_permissions  (cost=0.00..166.08 rows=57 width=76) (never executed)
                           Index Cond: (acs_permissions.grantee_id = "outer".group_id)
               ->  Hash  (cost=2.05..2.05 rows=1 width=68) (actual time=0.654..0.654 rows=0 loops=1)
                     ->  Seq Scan on acs_privilege_descendant_map  (cost=0.00..2.05 rows=1 width=68) (actual time=0.182..0.555 rows=27 loops=1)
                           Filter: ((privilege)::text = 'admin'::text)
         ->  Index Scan using acs_object_context_index_pk on acs_object_context_index  (cost=0.00..7.93 rows=1 width=4) (never executed)
               Index Cond: ((acs_object_context_index.object_id = 16631) AND ("outer".object_id = acs_object_context_index.ancestor_id))
 Total runtime: 1.176 ms
(16 rows)
[Now you’re going to tell me that my query plans reveal my ignorance! J]

I really hope that this proves to be useful. I have spent quite a few hours producing this data so I hope that we will be able to derive some benefit from it. Perhaps once the inevitable debate has been concluded we could decide what action is to be taken (if any). That would enable me to progress with the work that I am planning.

I also agree strongly that stripping out the unnecessary complexity of the ‘privileges’ hierarchy will lead to further significant performance improvements. Any reduction in the number of rows returned by the subselect will produce a reduction in the size of the joined dataset.

My personal view is that since the number of permissions calls grows directly in proportion to the number of objects (recognising a party as an object also) in the system and the granularity of permissioning required, and that the number of rows in the de-normalised hierarchy is polynomial with respect to the depth and complexity of the hierarchical relationships between objects, we should do everything we can to reduce the unit cost of any permission request. Doing so can only increase the scaleability of our system and defer the point of final meltdown!

All of this comes with the caveat that I may not have implemented something correctly so please continue to be critical in your appraisal and let me know if my query doesn't do what I think it does!

I hope very much that we can overcome the practical issues associated with making a change.

Regards Richard.

Collapse
Posted by Andrew Piskorski on
So, this thread demonstrates that the display width of posts in openacs.org Forums broken yet again. Somebody with access please fix this. It's a bit difficult to follow a discussion when all posts above are forced into being incredibly wide, solely because the 12th post in this thread, above, is very poorly formatted.

(And we've only been repeatedly fixing and re-breaking this obvious feature for years now, people...)