Forum OpenACS Development: Slow Query in Logger

Collapse
Posted by Alex Kroman on
We have a ton of data in project manager and logger and it is starting to get quite slow so I've been trying to optimize a lot of the queries by adding some missing indexes. So far the changes have sped up project manager quite a bit and I hope to add them into the next release.

There is still however one query that I haven't been able to speed up. On my machine it takes on average 5 seconds to execute pretty much nullifying the rest of the speed games.

The query is select_users from logger/lib/entries.xql.

If anyone has any suggestions I'd love to hear them.

Here it is:

select submitter.first_names || ' ' || submitter.last_name as label,
                       submitter.user_id as user_id
    	    from   cc_users submitter,
    	           logger_entries le,
    	           acs_objects ao
    	    where  ao.object_id = le.entry_id
    	    and    submitter.user_id = ao.creation_user
    	    and    exists (select 1
                               from   logger_project_pkg_map
                               where  project_id = le.project_id
                               and    package_id = :package_id)
    	    group  by submitter.user_id, submitter.first_names, submitter.last_name
Collapse
2: Re: Slow Query in Logger (response to 1)
Posted by Dave Bauer on
First

Join on acs_users_all instead of cc_users. This should greatly speed up your query.

Collapse
3: Re: Slow Query in Logger (response to 1)
Posted by Alex Kroman on
Hi Dave,

It speeds it up a little bit but it's still taking anywhere from 4-6 seconds on my system. Any other ideas?

I put the following indexes on the logger tables.

CREATE INDEX logger_variables_package_id_inx ON logger_variables(package_id);
CREATE INDEX logger_projects_inx ON logger_projects(project_lead);
CREATE INDEX logger_projections_variable_id_inx ON logger_projections(variable_id);
CREATE INDEX logger_projections_project_id_inx ON logger_projections(project_id);
CREATE INDEX logger_project_var_map_variable_id_inx ON logger_project_var_map(variable_id);
CREATE INDEX logger_project_pkg_map_package_id_inx ON logger_project_pkg_map(package_id);
CREATE INDEX logger_entries_variable_id_inx ON logger_entries(variable_id);
CREATE INDEX logger_entries_project_id_inx ON logger_entries(project_id);

Collapse
4: Re: Slow Query in Logger (response to 1)
Posted by Dave Bauer on
Make sure you have vacuum analyzed the database.

The please do

explain analyze (your query) in psql and post the resulting query plan.

Collapse
5: Re: Slow Query in Logger (response to 1)
Posted by Alex Kroman on
Here you go:

ibr=# explain analyze select submitter.first_names || ' ' || submitter.last_name as label,
ibr-# submitter.person_id as user_id
ibr-# from persons submitter,
ibr-# logger_entries le,
ibr-# acs_objects ao
ibr-# where ao.object_id = le.entry_id
ibr-# and submitter.person_id = ao.creation_user
ibr-# and exists (select 1
ibr(# from logger_project_pkg_map
ibr(# where project_id = le.project_id
ibr(# and package_id = '784')
ibr-# group by submitter.person_id, submitter.first_names, submitter.last_name
ibr-# order by submitter.first_names, submitter.last_name
ibr-# ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=111427.29..111427.46 rows=69 width=23) (actual time=7617.886..7617.902 rows=46 loops=1)
Sort Key: submitter.first_names, submitter.last_name
-> HashAggregate (cost=111424.84..111425.18 rows=69 width=23) (actual time=7617.251..7617.485 rows=46 loops=1)
-> Hash Join (cost=94996.07..111305.26 rows=15944 width=23) (actual time=1007.526..7483.693 rows=32715 loops=1)
Hash Cond: ("outer".creation_user = "inner".person_id)
-> Hash Join (cost=94994.21..111062.16 rows=16358 width=4) (actual time=1006.676..7352.151 rows=32715 loops=1)
Hash Cond: ("outer".object_id = "inner".entry_id)
-> Seq Scan on acs_objects ao (cost=0.00..8554.95 rows=319695 width=8) (actual time=0.081..931.071 rows=319695 loops=1)
-> Hash (cost=94897.31..94897.31 rows=16358 width=4) (actual time=1006.113..1006.113 rows=0 loops=1)
-> Seq Scan on logger_entries le (cost=0.00..94897.31 rows=16358 width=4) (actual time=0.808..911.293 rows=32715 loops=1)
Filter: (subplan)
SubPlan
-> Index Scan using logger_project_pkg_map_un on logger_project_pkg_map (cost=0.00..5.75 rows=2 width=0) (actual time=0.019..0.019 rows=1 loops=32715)
Index Cond: ((project_id = $0) AND (package_id = 784))
-> Hash (cost=1.69..1.69 rows=69 width=23) (actual time=0.771..0.771 rows=0 loops=1)
-> Seq Scan on persons submitter (cost=0.00..1.69 rows=69 width=23) (actual time=0.177..0.437 rows=69 loops=1)
Total runtime: 7620.962 ms
(17 rows)

Collapse
6: Re: Slow Query in Logger (response to 1)
Posted by Dave Bauer on
The sequential scan on acs_objects certainly isn't helping. Its also doing a sequential scan on persons, but that depends on how many persons there are if its a problem.

I don't see any reason it would be doing a sequential scan on acs_objects. Hopefully someone else will have an idea!

Collapse
7: Re: Slow Query in Logger (response to 1)
Posted by Alex Kroman on
Something is happening to screw up the query optimizer.

when I do:

set enable_seqscan to off;

and rerun it I get:

Sort (cost=152847.41..152847.58 rows=69 width=23) (actual time=1873.434..1873.449 rows=46 loops=1)
Sort Key: submitter.first_names, submitter.last_name
-> HashAggregate (cost=152844.96..152845.30 rows=69 width=23) (actual time=1872.942..1873.205 rows=46 loops=1)
-> Hash Join (cost=3.98..152724.60 rows=16048 width=23) (actual time=1.770..1760.872 rows=32715 loops=1)
Hash Cond: ("outer".creation_user = "inner".person_id)
-> Nested Loop (cost=0.00..152477.84 rows=16459 width=4) (actual time=0.684..1612.689 rows=32715 loops=1)
-> Index Scan using logger_entries_pk on logger_entries le (cost=0.00..95328.26 rows=16459 width=4) (actual time=0.443..786.734 rows=32715 loops=1)
Filter: (subplan)
SubPlan
-> Index Scan using logger_project_pkg_map_un on logger_project_pkg_map (cost=0.00..5.74 rows=2 width=0) (actual time=0.015..0.015 rows=1 loops=32715)
Index Cond: ((project_id = $0) AND (package_id = 784))
-> Index Scan using acs_objects_pk on acs_objects ao (cost=0.00..3.46 rows=1 width=8) (actual time=0.018..0.020 rows=1 loops=32715)
Index Cond: (ao.object_id = "outer".entry_id)
-> Hash (cost=3.81..3.81 rows=69 width=23) (actual time=0.996..0.996 rows=0 loops=1)
-> Index Scan using persons_pk on persons submitter (cost=0.00..3.81 rows=69 width=23) (actual time=0.321..0.738 rows=69 loops=1)
Total runtime: 1874.322 ms
(16 rows)

Any ideas?

Collapse
8: Re: Slow Query in Logger (response to 1)
Posted by Mark Aufflick on
I've always found the people on pgsql-performance mailing list very helpful. See:

http://www.postgresql.org/community/lists/

Collapse
9: Re: Slow Query in Logger (response to 1)
Posted by Jade Rubick on
I'm not sure how PostgreSQL handles the order of the where clauses (it may ignore them entirely), but it might also be worth switching around the order of the where clauses and see if that changes anything.
Collapse
10: Re: Slow Query in Logger (response to 1)
Posted by martin hebrank on
For reasons not quite logical, the order of the from statements seems to make more difference than the order of the where clause. Changing the from clause to be:
acs_objects ao,
logger_entries le,
persons submitter

will likely give better performance. On my tests it made the acs_objects scan use the index.

** Martin

Collapse
11: Re: Slow Query in Logger (response to 1)
Posted by Alex Kroman on
Martin,

You rock! Thanks for figuring that out. I've also added some code to memoize that call which combined with this optimized query is going to really improve the performance of logger.

Collapse
12: Re: Slow Query in Logger (response to 1)
Posted by martin hebrank on
Good. :-> I'm gonna need this for the site i'm working on: http://www.treffpunktsystems.com/