Forum OpenACS Development: acs-kernel acs_permission__permission_p performance

I have been encountering poor performance in this Postgresql function.

This function is written as:

begin
  return exists (select 1
                from acs_permissions p, party_approved_member_map m, ...
end;

If I re-write it as:

begin
  return 1 <= (select count(*)
                from acs_permissions p, party_approved_member_map m, ...
  ...
end;

The original version can take *seconds* on my system.  The latter version never takes over 100ms to run.  This is with Postgresql 7.4.6.  The timings are derived from Postgresql logging, i.e. enabling log_statement and log_duration. (FYI I have stats_row_level is enabled to support pg_autovacuum). This appears to be a Postgresql issue.  I'm curious to  know if anybody else can replicate my results and whether anybody is running with Postgresql 8.0.0 and if it gives the same results. I recall some notes about certain queries being slowed down a lot by stats_row_level, but I haven't gone back to check this out, and I really like using pg_autovacuum.

Comments?  Ideas?  Have I hit a true bug in Postgresql, or is it just poorly tuned?

Collapse
Posted by Don Baccus on
I have noticed poor plans resulting from PG 7.4 in several cases where performance seemed better in earlier versions.

In particular the "exists" version of the query should NEVER EVER be slower than the "count(*)" version. "exists" can stop when it sees the first row returned from the query. The "count(*)" version has to count EVERY ROW.

So something's screwed up.

And it bums me out, I put a great deal of effort into optimizing these queries back in the 7.1 or 7.2 days and having version flip-flops in terms of performance as we seem to be seeing is a huge hassle. How can we support both 7.3 and 7.4 when the planner gives us vastly different (shittier) query plans for 7.4? And if 8.0 flips it back ... what can we depend on?

I would be interested in the query plans (EXPLAIN) for the two queries on your system.

Randy,

Perhaps the 'EXPLAIN' output for the two queries could be sent to whoever is responsible for the query planner on the PostgreSQL team. Definitely sounds like something not quite right that they should know about.

R.

Collapse
Posted by Jeff Davis on
I am running 8.0.0 locally for testing and I find the former query very fast on my system. One possibility is that since
it's a function the plan may have been created before you analyzed and just has a poor plan. In any case it would be interesting to see the explain analyze output for both versions to try to understand what's going on.
Collapse
Posted by Don Baccus on
Jeff makes an excellent point - to clarify, PG/plSQL functions are compiled when first encountered by a backend and are NOT recompiled if you ANALYZE.

So to make sure that all function calls make use of the data gathered by ANALYZE you need to restart AOLserver in order to flush backend connections. On restart, as each pool handle starts up a backend connection, the first call to each function will generate a new execution plan.

The other alternative is to do a CREATE OR REPLACE of critical functions in order to force a recompile but in practice restarting AOLserver's easier.

This would only explain your problem if your OpenACS instance has been running without interruption for a fairly long time and if your installation has grown.

The underlying issue is that a good plan for tables which are nearly empty may be a very bad plan for tables which have lots of data ...