Forum OpenACS Q&A: outer joins, views, and indexes in PG

1: outer joins, views, and indexes in PG

Posted by Jonathan Ellis on 06/13/02 11:22 PM

At carnageblender, I've normalized the weapons data with a "weapon_types" table containing the description ("long sword"), base cost, etc. and a "weapons" table containing the specific name ("glamdring"), enchantments, et. al. Each weapon belongs to a minion.

(code excerpts are also here b/c of the textarea-added linebreaks I can't seem to avoid on the bboard.)

create table weapon_types (
       id                  serial primary key,
       class               varchar(6) not null check (class in 
('ranged', 'melee', 'ammo')),
       ...
);

create table weapons (
       id             integer primary key,
       weapon_type_id integer not null references weapon_types,
       minion_id      integer references minions(id),
       ...
);
create index weapons_by_minion on weapons(minion_id);
create index weapons_by_type on weapons(weapon_type_id);

Since I hate retyping joins, I created a view on these tables:

create view weapon_and_type_v as
select ...
from weapons w, weapon_types wt
where w.weapon_type_id = wt.id

Each minion can hold one of each class of weapon. Rather than some truly awful grouping, I do a heinous denormalization (in a view):

create view minion_weapons_denorm_v as
select m.id as minion_id,
        w1....
        w2....
        w3....
from minions m
     left outer join weapon_and_type_v w1 on (m.id = w1.minion_id and 
w1.class = 'ammo')
     left outer join weapon_and_type_v w2 on (m.id = w2.minion_id and 
w2.class = 'melee')
     left outer join weapon_and_type_v w3 on (m.id = w3.minion_id and 
w3.class = 'ranged')

The problem is selecting from this last view is abominably slow. The planner for some reason can't see that the m to w1..w3 joins can use the weapons_by_minion index to vastly narrow down the amount of weapons it needs to scan. It prefers to do a seq scan on weapons for each of these:

Nested Loop  (cost=4.88..4044.44 rows=1 width=268)
  ->  Nested Loop  (cost=3.25..2696.97 rows=1 width=180)
        ->  Nested Loop  (cost=1.62..1349.50 rows=1 width=92)
              ->  Index Scan using minions_pkey on minions m  
(cost=0.00..2.03 rows=1 width=4)
              ->  Subquery Scan w1  (cost=1.62..1144.10 rows=13558 
width=116)
                    ->  Hash Join  (cost=1.62..1144.10 rows=13558 
width=116)
                          ->  Seq Scan on weapons w  
(cost=0.00..260.58 rows=13558 width=56)
                          ->  Hash  (cost=1.50..1.50 rows=50 width=60)
                                ->  Seq Scan on weapon_types wt  
(cost=0.00..1.50 rows=50 width=60)
        ->  Subquery Scan w2  (cost=1.62..1144.10 rows=13558 
width=116)
              ->  Hash Join  (cost=1.62..1144.10 rows=13558 width=116)
                    ->  Seq Scan on weapons w  (cost=0.00..260.58 
rows=13558 width=56)
                    ->  Hash  (cost=1.50..1.50 rows=50 width=60)
                          ->  Seq Scan on weapon_types wt  
(cost=0.00..1.50 rows=50 width=60)
  ->  Subquery Scan w3  (cost=1.62..1144.10 rows=13558 width=116)
        ->  Hash Join  (cost=1.62..1144.10 rows=13558 width=116)
              ->  Seq Scan on weapons w  (cost=0.00..260.58 
rows=13558 width=56)
              ->  Hash  (cost=1.50..1.50 rows=50 width=60)
                    ->  Seq Scan on weapon_types wt  (cost=0.00..1.50 
rows=50 width=60)

turning off seqscan doesn't help:

Nested Loop  (cost=0.00..10806.02 rows=1 width=268)
  ->  Nested Loop  (cost=0.00..7204.69 rows=1 width=180)
        ->  Nested Loop  (cost=0.00..3603.36 rows=1 width=92)
              ->  Index Scan using minions_pkey on minions m  
(cost=0.00..2.03 rows=1 width=4)
              ->  Subquery Scan w1  (cost=0.00..3397.96 rows=13558 
width=116)
                    ->  Merge Join  (cost=0.00..3397.96 rows=13558 
width=116)
                          ->  Index Scan using weapons_by_type on 
weapons w  (cost=0.00..3221.31 rows=13558 width=56)
                          ->  Index Scan using weapon_types_pkey on 
weapon_types wt  (cost=0.00..6.55 rows=50 width=60)
        ->  Subquery Scan w2  (cost=0.00..3397.96 rows=13558 
width=116)
              ->  Merge Join  (cost=0.00..3397.96 rows=13558 
width=116)
                    ->  Index Scan using weapons_by_type on weapons 
w  (cost=0.00..3221.31 rows=13558 width=56)
                    ->  Index Scan using weapon_types_pkey on 
weapon_types wt  (cost=0.00..6.55 rows=50 width=60)
  ->  Subquery Scan w3  (cost=0.00..3397.96 rows=13558 width=116)
        ->  Merge Join  (cost=0.00..3397.96 rows=13558 width=116)
              ->  Index Scan using weapons_by_type on weapons w  
(cost=0.00..3221.31 rows=13558 width=56)
              ->  Index Scan using weapon_types_pkey on weapon_types 
wt  (cost=0.00..6.55 rows=50 width=60)

Any ideas how I can convince/coerce PG into using the weapons_by_minion index here?

2: Response to outer joins, views, and indexes in PG (response to 1)

Posted by Don Baccus on 06/13/02 11:32 PM

There have been many heated arguments over this issue in the PG hacker's list, with the person who works on the optimizer being somewhat stubbornly hard to convince that the kind of exponential decay in execution time one sees with queries such as yours means that in most real-life questions it's better to err on the side of using an index than not. The penalty of being wrong in that direction generally isn't nearly as bad as being wrong in the direction of nested sequential scans. And of course the existence of an index might be considered a hint on the part of the datamodel designer that the index is expected to be used whenever it makes sense to do so.

But, as I say, the optimizer guy has stubbornly resisted efforts to change his opinion.

You can turn off sequential scans before the query is executed via a set command of some sort, check the docs. This will force the use of the index.

3: Response to outer joins, views, and indexes in PG (response to 1)

Posted by David Walker on 06/13/02 11:59 PM

Perhaps writing as subselects or as functions would do the trick.
Either way you would have a select done for each row but if that
still might be the fastest way to do it.

4: Response to outer joins, views, and indexes in PG (response to 1)

Posted by Jonathan Ellis on 06/14/02 06:07 PM

I did turn seqscan off and it did force index use -- but not the "right" one. But it uses other indexes on the base table the view is part of, which is strange to me.

I don't see how to substitute a subselect for the outer join here. Looks like my only option is to get rid of the weapon_and_type_v; the optimizer does a much better job with the base tables.

5: Response to outer joins, views, and indexes in PG (response to 1)

Posted by Jonathan Ellis on 06/14/02 10:55 PM

that didn't work either; something about it is convincing the optimizer it needs to scan all 13k rows of the weapons table.

I created a materialized view of weapons and weapon_types and that fixed this particular problem but when I start join this view (minion_weapons_denorm_v, that is) with others the same thing starts happening. Apparently something in the nature of these joins is confusing hell out of the planner and it's joining the entire minions and weapons table together before applying the minion_id constraint that cuts it down to one row. (!)

6: Response to outer joins, views, and indexes in PG (response to 1)

Posted by Jonathan Ellis on 06/15/02 12:07 AM

aha...

the second problem resulted from this view being joined with another (minion_spells_denorm_v) that also used minions as a base table. Apparently that confused the planner into thinking it needed to do the cross join.