Forum OpenACS Development: Re: Fixing a very slow query

Posted by russ m on
I've got no idea if you still care about this, but...

it's not the sort, it's the big outer nested loop that's killing you... the first result is returned 19.707ms after it starts, and the last is almost 26 seconds later... the outer sort starts ~1ms after the nested loop completes and takes less than a millisecond...

the innermost seq. scan on cr_folders is the real issue - it's cheap but is executed 5301 times in that example... it's also odd that cr_folders is being seq. scanned when they're all doing is a direct match against folder_id - is there any chance that cr_folders.folder_id is not indexed, or the db hasn't been vacuumed in a while?

a rough guess is (depending on your PG version and it's planner) you might see a worthwhile improvement by shifting the "and not exists (select 1 from ...)" condition above "and exists (select 1 from cr_folders f where f.folder_id = i.item_id)" and see if the planner is smart enough to take the hint about checking item_id and tree_sortkey before seq. scanning cr_folders to see if the item in question happens to be a folder...

Posted by Richard Hamilton on
I thought that the query planner was supposed to be clever enough to select the best method! I also have always been led to believe that 'exists' is superior to 'in' because the subquery need only return a boolean value rather that requiring a sequential search through the subquery output.

I remember asking questions about optimising the permissions queries in OpenACS after seeing some very strange query times. It was Don who explained that 'vacuum analyse' in postgres collects stats and allows the planner to make much better decisions.

The combination of indices on the integer keys and multiple runs of vacuum analyse should result in a much improved execution plan.


Posted by Don Baccus on
Statistics-based optimization will always be fooled at times, unfortunately.

Regarding "exists" vs. "in", you're probably remembering advice from long ago, as the PG people figured out how to implement "in" as a special kind of join-like operation some years ago, thus making it as efficient in PG as it had been in Oracle.

In the past "in" was always extremely slow.

"exists" should still be faster, though.

Posted by Janine Ohmer on
Thanks! Although I eventually punted on that particular query, your post was helpful to me in reading the explain output for another one just a few days ago. I find the documentation for explain to be rather... opaque, to say the least.