Forum OpenACS Development: Lacking views in CR

Collapse
Posted by Malte Sussdorff on
Currently the content repository is missing a view for all "latest_revisions" and for "live_revisions".

I want to create a TIP and implement this, adding the following views:

< mycontenttype>_latest (for a list of all items with the latest revision)
< mycontenttype>_live (for a list of all items with the live revision)

Does this make sense to you? Is the naming fine? Any other comments?

Collapse
2: Re: Lacking views in CR (response to 1)
Posted by Dave Bauer on
There are two options.

1) Make a new view that contains live_revision and latest_revision
2) Add these columns to the cr_revisionsx select view

I don't think we really need two seperate views. Something like

Select * from cr_revisionsl where revision_id=latest_revision will select the latest revisions

and
select * from cr_revisionsl where revision_id=live_revision will select the live revisions

Since we are already joining with cr_items in the cr_revisionsx view, it seems that we can just add it to that view, although that would break all the code that looks like this:

select * from cr_revisionsx cr, cr_items ci where ci.live_reivsion=cr.revision_id

Collapse
Posted by Nis Jørgensen on
An observation related to the performance discussion:

I ran the following queries on a reasonably large Oracle database (45000 items, aroudn 100000 revisions)

SELECT count(*)
FROM cr_items ci, cr_revisions cr
WHERE ci.item_id = cr.item_id AND
ci.live_revision = cr.revision_id ;

Took 8 seconds

SELECT count(*)
FROM cr_items ci, cr_revisions cr
WHERE
ci.live_revision = cr.revision_id ;

Took 0 seconds 😊

This is a little worrying, since the first version is de facto what we would be doing if the view was extended to include live_revision, and this was then used to find the live ones.

I think it is quite obvious what happens here: Oracle isn't using the best index available, because it doesn't know of the implicit assumption that the live revision of an item is a revision of exactly that item.

Solution: Tell Oracle about the logical link between live revisions and their associated items:

/* This constraint logically follows from the uniqueness of revision_id. But Oracle is too stupid to see that*/

ALTER TABLE cr_revisions ADD (CONSTRAINT cr_revisions_rev_item_unq
UNIQUE (revision_id, item_id));

/* This is what we really want to tell Oracle */

ALTER TABLE cr_items
ADD (CONSTRAINT cr_items_live_item_fk
FOREIGN KEY (live_revision, item_id)
REFERENCES cr_revisions (revision_id, item_id));

After adding this constraints, both the above queries return "immediately".

The second constraint could logically replace the current foreign key constraint on live_revision (cr_items_live_fk) - since it is in fact a strengthening of that constraint. It might be good to test performance of existing code.

All the above goes for latest revisions as well

Looking at postgres, it looks like we are NOT using declarative referential integrity for this case at all - but are instead relying on triggers. There is a comment saying:

-- postgresql RI bug causes multiple failures with regards to deletion of
-- content_revisions (DanW mailto:dcwickstrom@earthlink.net)

Does anyone know

* Which bug this is?
* If it has been fixed, and if so in which version?

Collapse
3: Re: Lacking views in CR (response to 1)
Posted by Malte Sussdorff on
In this case, wouldn't it be possible to have a view which contains the additional fields:

- live_revision_p
- last_revision_p

Not sure though how we could formulate the view, probably with a decode, which I'm not so sure works best.

Collapse
4: Re: Lacking views in CR (response to 1)
Posted by Jeff Davis on
It would be faster to just have live_revision and latest_revision in the view rather than something which would turn into:
where 
  decode(live_revision = revision_id,1,'t','f') = 't'
or something like that as it would be a where clause with a function which, as well all know, sucks bigtime.
Collapse
6: Re: Lacking views in CR (response to 1)
Posted by Dave Bauer on
Strange. I get the exact opposite results with a default PostgreSQL install:

The view:

explain analyze SELECT count(*)
FROM cr_items ci, cr_revisions cr
WHERE ci.item_id = cr.item_id AND
ci.live_revision = cr.revision_id ;

QUERY PLAN
-------------------------------------------------------------------------------\
---------------------------------------------------
 Aggregate  (cost=2192.73..2192.73 rows=1 width=0) (actual time=575.284..575.286 rows=1 loops=1)
   ->  Hash Join  (cost=622.84..2192.72 rows=1 width=0) (actual time=129.234..532.460 rows=16350 loops=1)
         Hash Cond: (("outer".item_id = "inner".item_id) AND ("outer".revision_id = "inner".live_revision))
         ->  Seq Scan on cr_revisions cr  (cost=0.00..806.97 rows=18397 width=8) (actual time=0.081..72.097 rows=18403 loops=1)
         ->  Hash  (cost=470.89..470.89 rows=16989 width=8) (actual time=128.085..128.085 rows=0 loops=1)
               ->  Seq Scan on cr_items ci  (cost=0.00..470.89 rows=16989 width=8) (actual time=0.080..75.305 rows=16992 loops=1)
 Total runtime: 576.018 ms
(7 rows)

explain analyze SELECT count(*)
FROM cr_items ci, cr_revisions cr
WHERE ci.live_revision = cr.revision_id ;
                                                            QUERY PLAN
-------------------------------------------------------------------------------\
----------------------------------------------------
 Aggregate  (cost=2045.51..2045.51 rows=1 width=0) (actual time=2722.408..2722.410 rows=1 loops=1)
   ->  Hash Join  (cost=572.36..2004.47 rows=16418 width=0) (actual time=219.867..1689.530 rows=16352 loops=1)
         Hash Cond: ("outer".revision_id = "inner".live_revision)
         ->  Seq Scan on cr_revisions cr  (cost=0.00..806.97 rows=18397 width=4) (actual time=0.069..174.145 rows=18403 loops=1)
         ->  Hash  (cost=470.89..470.89 rows=16989 width=4) (actual time=219.544..219.544 rows=0 loops=1)
               ->  Seq Scan on cr_items ci  (cost=0.00..470.89 rows=16989 width=4) (actual time=0.076..151.643 rows=16992 loops=1)
 Total runtime: 2722.725 ms
(7 rows)

Also you do not need to specify cr.item_id=ci.item_id in the first query. Where cr.revision_id=ci.live_revision will only return at mot one row.

Collapse
Posted by Nis Jørgensen on
Dave, your result is not so strange, since the foreign key reference is not present in postgres.

My guess is that adding the simple rreference will give the same results as Oracle, while the multi-field version will make them behave identically.

Regarding the last comment: That is the entire point, right? You don't need the extra condition, but in many cases you inherit it from underlying views. And it slows things down for Oracle.

/Nis

Collapse
8: Re: Lacking views in CR (response to 1)
Posted by Dave Bauer on
Right. In PostgreSQL you can see explicity that the hash join is different for the two queries:

With the view:

" Hash Cond: (("outer".item_id = "inner".item_id) AND ("outer".revision_id = "inner".live_revision))"

with the join on cr_items, cr_revisions

"Hash Cond: ("outer".revision_id = "inner".live_revision)"

It would be interesting to see how Oracle plans the two different queries.

Here is the second query rewritten to add item_id to the where clause

explain analyze select count(*) from cr_items ci, cr_revisions cr where ci.item_id=cr.item_id and ci.live_revision=cr.revision_id;
                                                            QUERY PLAN
-------------------------------------------------------------------------------\
---------------------------------------------------
 Aggregate  (cost=2200.15..2200.15 rows=1 width=0) (actual time=560.144..560.146 rows=1 loops=1)
   ->  Hash Join  (cost=623.38..2200.14 rows=1 width=0) (actual time=128.619..517.290 rows=16383 loops=1)
         Hash Cond: (("outer".item_id = "inner".item_id) AND ("outer".revision_id = "inner".live_revision))
         ->  Seq Scan on cr_revisions cr  (cost=0.00..809.69 rows=18469 width=8) (actual time=0.076..72.671 rows=18469 loops=1)
         ->  Hash  (cost=471.25..471.25 rows=17025 width=8) (actual time=127.608..127.608 rows=0 loops=1)
               ->  Seq Scan on cr_items ci  (cost=0.00..471.25 rows=17025 width=8) (actual time=0.090..75.695 rows=17025 loops=1)
 Total runtime: 561.095 ms
(7 rows)

Collapse
9: Re: Lacking views in CR (response to 1)
Posted by Dave Bauer on
Surprisingly according to my last post, the additional where ci.item_id=cr.item_id makes PostgreSQL faster. So it looks like this performance issue is unique to Oracle. The query still takes .5 secs, but we know count is not fast on PostgreSQL under an circumstances.