Forum OpenACS Q&A: Outer Join

Collapse
Posted by Cristian Petrescu-Prahova on
I've noticed you proposed to replace outer joins by
select ...
union
select ...
where 0 = (select count (*) ...)
On the PostgreSQL site I found they suggest
select ...
union
select ...
where not exists (select * ...)
I have a feeling that the latter solution is better costwise.
Collapse
2: Response to Outer Join (response to 1)
Posted by Ben Adida on
This is true. We'll reissue a new guide to porting for OpenACS
4.0, which will probably need a lot of rewriting anyways.
Collapse
3: Response to Outer Join (response to 1)
Posted by Don Baccus on
I didn't realize Ben was the source of that :)

I've translated outer joins all along with "not exists", but this after all was a fairly recent (6.5.*) addition to PG.

Suppose I should sit down and read the manuals more closely.  The reality, though, is that the UNION constructs are going to be relatively inefficient, the "not exists" wins for those rows returned in the first clause, but all the others cost as much as doing "count(*)" since the db has to search all row candidates in order to return "false".

Still, it is a win ...

Collapse
4: Response to Outer Join (response to 1)
Posted by Cristian Petrescu-Prahova on
Actually the win is even smaller 😊

Without proper indexing both constructs are going to crawl.

With indexing the difference is tiny. Yet...

Collapse
5: Response to Outer Join (response to 1)
Posted by Don Baccus on
Postgres doesn't execute queries on index values alone, ala Oracle and many other more mature RDBMS systems.  So the count() really does chug through all the qualified rows (deciding which rows to chug through can make use of an index).

"not exists" will examine the same rows, but will stop the moment a row is found that matches the criteria.  count(*) will examine each row.

And the subselect is done many, many times for a typical UNION'd outer  join if the table's got a lot of rows in it.

The "not exists" form will always be slightly faster (as there's overhead with aggregate functions which need to be set up once, then called for each row) and sometimes much faster, depending on the actual data in the table.

I'm at a serious disadvantage here - I actually know how Postgres works.