Forum OpenACS Development: Slow queries on Bug-Tracker. Some SQL Tunning

1: Slow queries on Bug-Tracker. Some SQL Tunning

Posted by Eduardo Santos on 03/09/10 09:13 PM

Hi everybody,

I've been lately working on bug-tracker to make it more accessible to users, as we are using it as an error report tool. However, in the way things are now, it's completely impossible for them to just access the index page, because it's taking incredible 50 seconds (yes, fifty seconds) to load.

In order to improve it, I've just recently upgraded to the newest version on HEAD. I've found some small bugs, such as upgrade scripts on wrong folders, and some other issues, and I've also realized someone has been doing some work on it. There were three very slow queries, wich seem to be reduced to only one right now. The query I'm talking about must look like this:

select q.*, km.keyword_id, assign_info.* from ( select b.bug_id, b.project_id, b.bug_number, b.summary, lower(b.summary) as lower_summary, b.comment_content, b.comment_format, b.component_id, to_char(b.creation_date,'YYYY-MM-DD HH24:MI:SS') as creation_date, to_char(b.creation_date, 'YYYY-MM-DD HH24:MI:SS') as creation_date_pretty, b.creation_user as submitter_user_id, submitter.first_names as submitter_first_names, submitter.last_name as submitter_last_name, submitter.email as submitter_email, lower(submitter.first_names) as lower_submitter_first_names, lower(submitter.last_name) as lower_submitter_last_name, lower(submitter.email) as lower_submitter_email, st.pretty_name as pretty_state, st.short_name as state_short_name, st.state_id, st.hide_fields, b.resolution, b.found_in_version, b.fix_for_version, b.fixed_in_version, cas.case_id from bt_bugs b, acs_users_all submitter, workflow_cases cas, workflow_case_fsm cfsm, workflow_fsm_states st where submitter.user_id = b.creation_user and cas.workflow_id = '1500160' and cas.object_id = b.bug_id and cfsm.case_id = cas.case_id and cfsm.parent_enabled_action_id is null and st.state_id = cfsm.current_state and cfsm.current_state = '4'

) q left outer join cr_item_keyword_map km on (bug_id = km.item_id) left outer join (select cru.user_id as assigned_user_id, aa.action_id, aa.case_id, wa.pretty_name as action_pretty_name, p.first_names as assignee_first_names, p.last_name as assignee_last_name from workflow_case_assigned_actions aa, workflow_case_role_user_map cru, workflow_actions wa, persons p where aa.case_id = cru.case_id and aa.role_id = cru.role_id and cru.user_id = p.person_id and wa.action_id = aa.action_id ) assign_info on (q.case_id = assign_info.case_id) order by bug_number desc

I've put it on explain analyze and this is what I've found:

"Nested Loop Left Join (cost=6549.55..113191.28 rows=221 width=3204) (actual time=900.479..41514.489 rows=35 loops=1)" " Join Filter: (cas.case_id = assign_info.case_id)" " -> Nested Loop (cost=4.95..159.62 rows=1 width=2338) (actual time=0.408..38.840 rows=35 loops=1)" " -> Nested Loop (cost=4.95..158.53 rows=1 width=1186) (actual time=0.403..38.408 rows=35 loops=1)" " -> Nested Loop (cost=4.95..152.23 rows=1 width=1168) (actual time=0.392..37.839 rows=35 loops=1)" " -> Nested Loop (cost=4.95..145.99 rows=1 width=1140) (actual time=0.380..37.314 rows=35 loops=1)" " -> Nested Loop (cost=4.95..139.34 rows=1 width=1136) (actual time=0.365..36.695 rows=35 loops=1)" " Join Filter: (cfsm.case_id = cas.case_id)" " -> Nested Loop Left Join (cost=4.95..134.38 rows=1 width=1132) (actual time=0.310..21.372 rows=236 loops=1)" " -> Nested Loop (cost=4.95..134.05 rows=1 width=1128) (actual time=0.296..20.012 rows=236 loops=1)" " Join Filter: (cas.object_id = b.bug_id)" " -> Index Scan Backward using bt_bugs_bug_number_idx on bt_bugs b (cost=0.00..123.79 rows=236 width=1124) (actual time=0.021..0.369 rows=236 loops=1)" " -> Materialize (cost=4.95..4.96 rows=1 width=8) (actual time=0.000..0.032 rows=236 loops=236)" " -> Seq Scan on workflow_cases cas (cost=0.00..4.95 rows=1 width=8) (actual time=0.012..0.097 rows=236 loops=1)" " Filter: (workflow_id = 1500160)" " -> Index Scan using cr_item_keyword_map_pk on cr_item_keyword_map km (cost=0.00..0.32 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=236)" " Index Cond: (b.bug_id = km.item_id)" " -> Seq Scan on workflow_case_fsm cfsm (cost=0.00..4.95 rows=1 width=8) (actual time=0.007..0.056 rows=35 loops=236)" " Filter: ((parent_enabled_action_id IS NULL) AND (current_state = 4))" " -> Index Scan using users_pk on users u (cost=0.00..6.63 rows=1 width=4) (actual time=0.012..0.014 rows=1 loops=35)" " Index Cond: (u.user_id = b.creation_user)" " -> Index Scan using persons_pk on persons pe (cost=0.00..6.22 rows=1 width=28) (actual time=0.010..0.012 rows=1 loops=35)" " Index Cond: (pe.person_id = u.user_id)" " -> Index Scan using parties_pk on parties pa (cost=0.00..6.29 rows=1 width=30) (actual time=0.010..0.013 rows=1 loops=35)" " Index Cond: (pa.party_id = pe.person_id)" " -> Seq Scan on workflow_fsm_states st (cost=0.00..1.07 rows=1 width=1156) (actual time=0.003..0.008 rows=1 loops=35)" " Filter: (4 = state_id)" " -> Hash Join (cost=6544.60..111857.00 rows=52060 width=458) (actual time=400.595..1184.905 rows=119 loops=35)" " Hash Cond: (COALESCE(dep.deputy_user_id, u.user_id) = p.person_id)" " -> Hash Join (cost=4221.18..108297.15 rows=52060 width=434) (actual time=398.557..1182.511 rows=119 loops=35)" " Hash Cond: ((wcrpm.case_id = wcea.case_id) AND (wcrpm.role_id = wa.assigned_role))" " -> Hash Join (cost=4183.83..93405.47 rows=716687 width=16) (actual time=398.522..1182.158 rows=472 loops=35)" " Hash Cond: (pamm.member_id = u.user_id)" " -> Merge Join (cost=29.68..72230.01 rows=716687 width=12) (actual time=395.941..1178.950 rows=472 loops=35)" " Merge Cond: (pamm.party_id = wcrpm.party_id)" " -> Index Scan using party_approved_member_map_pk on party_approved_member_map pamm (cost=0.00..59210.21 rows=1066296 width=8) (actual time=0.014..933.179 rows=1059493 loops=35)" " -> Sort (cost=29.68..30.86 rows=472 width=12) (actual time=0.013..0.128 rows=472 loops=35)" " Sort Key: wcrpm.party_id" " -> Seq Scan on workflow_case_role_party_map wcrpm (cost=0.00..8.72 rows=472 width=12) (actual time=0.009..0.151 rows=472 loops=1)" " -> Hash (cost=3222.25..3222.25 rows=74552 width=8) (actual time=90.017..90.017 rows=74552 loops=1)" " -> Hash Left Join (cost=13.00..3222.25 rows=74552 width=8) (actual time=0.028..56.866 rows=74552 loops=1)" " Hash Cond: (u.user_id = dep.user_id)" " -> Seq Scan on users u (cost=0.00..2929.52 rows=74552 width=4) (actual time=0.011..23.485 rows=74552 loops=1)" " -> Hash (cost=12.80..12.80 rows=16 width=8) (actual time=0.002..0.002 rows=0 loops=1)" " -> Seq Scan on workflow_deputies dep (cost=0.00..12.80 rows=16 width=8) (actual time=0.001..0.001 rows=0 loops=1)" " Filter: ((now() >= start_date) AND (now() <= end_date))" " -> Hash (cost=33.74..33.74 rows=240 width=434) (actual time=0.994..0.994 rows=240 loops=1)" " -> Hash Join (cost=10.13..33.74 rows=240 width=434) (actual time=0.300..0.821 rows=240 loops=1)" " Hash Cond: (wcea.action_id = wa.action_id)" " -> Hash Join (cost=7.31..27.62 rows=240 width=12) (actual time=0.212..0.592 rows=240 loops=1)" " Hash Cond: (wcea.case_id = c.case_id)" " -> Seq Scan on workflow_case_enabled_actions wcea (cost=0.00..17.01 rows=240 width=8) (actual time=0.013..0.263 rows=240 loops=1)" " Filter: ((NOT completed_p) AND assigned_p)" " -> Hash (cost=4.36..4.36 rows=236 width=4) (actual time=0.184..0.184 rows=236 loops=1)" " -> Seq Scan on workflow_cases c (cost=0.00..4.36 rows=236 width=4) (actual time=0.009..0.068 rows=236 loops=1)" " -> Hash (cost=2.65..2.65 rows=14 width=430) (actual time=0.075..0.075 rows=14 loops=1)" " -> Hash Join (cost=1.32..2.65 rows=14 width=430) (actual time=0.044..0.057 rows=14 loops=1)" " Hash Cond: (wa.action_id = wa.action_id)" " -> Seq Scan on workflow_actions wa (cost=0.00..1.14 rows=14 width=8) (actual time=0.006..0.008 rows=14 loops=1)" " -> Hash (cost=1.14..1.14 rows=14 width=422) (actual time=0.020..0.020 rows=14 loops=1)" " -> Seq Scan on workflow_actions wa (cost=0.00..1.14 rows=14 width=422) (actual time=0.004..0.011 rows=14 loops=1)" " -> Hash (cost=1391.52..1391.52 rows=74552 width=28) (actual time=71.027..71.027 rows=74552 loops=1)" " -> Seq Scan on persons p (cost=0.00..1391.52 rows=74552 width=28) (actual time=0.009..24.650 rows=74552 loops=1)" "Total runtime: 41514.945 ms"

It seems like this query has two big troubles:

1 - The Index Scan on party_approved_member_map is incredibly slow.

2 - The Hash Join with the users table is also taking a long time

Most of the problems come from the huge amount of users we have (around 65.000), but I guess this performance can be improved. I've tried some configurations tuning on postgresql.conf, but I wasn't very successfull. So, can anybody give me a help on that? I believe there are other systems with the same amount of users using bug-tracker, and I don't see how it can be fast enough in order to be useable.

Any help is appreciated.

3: Re: Slow queries on Bug-Tracker. Some SQL Tunning (response to 2)

Posted by Eduardo Santos on 03/15/10 02:52 PM

Hi Dave,

I don't think I can improve this by tuning. I've already played with the indexes, changed the shared buffers, etc. I've got to the amount of 3GB for shared buffers, wich is close to the best results I've got grom pgbench. The machine is a dedicated IBM Server with 16GB RAM and two Quad Cores (Linux sees it as 8 processors). If there's any way to improve this query with server tuning, it's beyond my knowledge to do it.

That's why I've thought about tuning the query, wich seems to be a complex task. Do you think there's any way it cab be tuned? I couldn't find any looking at the data model.

5: Re: Slow queries on Bug-Tracker. Some SQL Tunning (response to 1)

Posted by Rocael Hernández Rizzardini on 03/16/10 01:20 AM

Which version of PG are you using?, left outer join in the planner prediction is known to be slow in some versions.

7: Re: Slow queries on Bug-Tracker. Some SQL Tunning (response to 6)

Posted by Nathan Lunt on 03/16/10 01:39 PM

I don't have a lot of time to delve into this right now, but I wanted throw this out. Is there a chance that performance may be improved by doing two separate queries and appending the data to the list as necessary? Sorry I can't find the answer for myself, but I have seen situations where this is the case.

9: Re: Slow queries on Bug-Tracker. Some SQL Tunning (response to 1)

Posted by Dave Bauer on 03/16/10 02:21 PM

You might try the geqo_threshhold setting increasing it to 24 or so to see if it speeds up this particular query.

geqo_threshold (integer)

Use genetic query optimization to plan queries with at least this many FROM items involved. (Note that a FULL OUTER JOIN construct counts as only one FROM item.) The default is 12. For simpler queries it is usually best to use the deterministic, exhaustive planner, but for queries with many tables the deterministic planner takes too long.

See: http://www.postgresql.org/docs/8.2/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-GEQO

11: Re: Slow queries on Bug-Tracker. Some SQL Tunning (response to 10)

Posted by Eduardo Santos on 03/22/10 12:55 PM

Hi Dave,

Thank you very much for your tips. PostgreSQL really have some issues that I didn't know about. I was thinking that the optimizer was not taking the best way for this query, but it's very difficult to find this out.

I'm out of access to the database right now, but as soon as I can I'll do this tuning and post the results back here.

Thank you very much again.