Forum OpenACS Development: Best way to get name of last modified content item?

I need to retrieve the name of the content_item that was modified for a specific content type. I have two solutions, but I'm not sure if they are identical.

a) select name from cr_items where latest_revision = (select max(latest_revision) from cr_items where content_type = 'contact');

b) select name from cr_items i, acs_objects o where i.item_id = o.object_id and i.content_type='contact' and o.last_modified = (s\elect max(last_modified) from acs_objects o,cr_items i where i.content_type='contact' and o.object_id=i.item_id);

Inherently I trust the query b) to be more accurate, but I think that a) is sufficient, assuming that latest_revision will always be updated if a new version of the cr_item is uploaded, even if the revision is not life.

Two questions:

1) Is my assumption correct that query a) is sufficient?
2) Is there a better way to do this?

Another idea (probably worse); create a trigger on cr_items to update a auxilliary table whenver a 'contact' is updated?
Malte,

Isn't there a plpgsql call to return this info? If not perhaps there should be.

R.

Collapse
Posted by Jeff Davis on
the latest_revision is trigger maintained by an insert trigger on cr_revisions so the first one is better. It is likely to cause a table scan on cr_revisions though since it uses the max function. Here are some numbers from openacs.org for looking up the latest bug revision (this is with ~20k cr_items):
explain analyze select name from cr_items where latest_revision = (
select max(latest_revision) from cr_items
 where content_type = 'bt_bug_revision');
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using cr_items_by_latest_revision on cr_items  (cost=509.17..513.41 rows=1 width=18) (actual time=65.113..65.120 rows=1 loops=1)
   Index Cond: (latest_revision = $0)
   InitPlan
     ->  Aggregate  (cost=509.17..509.17 rows=1 width=4) (actual time=64.865..64.866 rows=1 loops=1)
           ->  Seq Scan on cr_items  (cost=0.00..503.70 rows=2187 width=4) (actual time=0.203..59.467 rows=2369 loops=1)
                 Filter: ((content_type)::text = 'bt_bug_revision'::text)
 Total runtime: 65.292 ms
And here is the better version:
explain analyze select name from cr_items where latest_revision = (
select latest_revision from cr_items where content_type =
  'bt_bug_revision' order by latest_revision desc limit 1);
                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using cr_items_by_latest_revision on cr_items  (cost=0.43..4.67 rows=1 width=18) (actual time=2.029..2.033 rows=1 loops=1)
   Index Cond: (latest_revision = $0)
   InitPlan
     ->  Limit  (cost=0.00..0.43 rows=1 width=4) (actual time=1.986..1.987 rows=1 loops=1)
           ->  Index Scan Backward using cr_items_by_latest_revision on cr_items  (cost=0.00..936.33 rows=2187 width=4) (actual time=1.980..1.980 rows=1 loops=1)
                 Filter: ((content_type)::text = 'bt_bug_revision'::text)
 Total runtime: 2.143 ms
so using order and limit instead of max is about 30x faster :)

As a general rule you want to avoid these sorts of functions in postgres if you can since they typically trigger table scans and you want to use an index if one exists.

Thanks a lot Jeff, this is really valuable advise.
Collapse
Posted by Don Baccus on
In general, in Postgres, the ORDER BY/LIMIT trick Jeff's using is the recommended way (by the PG people) to get around their poor implementation of max().

It's not really so much that the implementation is poor, as that it is implemented by a generalized, extensible aggregate function API that can not be optimized. However ... it's a bit surprising that they haven't special-cased max() and min(), transforming it to ORDER BY/LIMIT at least in cases that the optimizer could analyze and confirm that the query's equivalent.

In Oracle, max() is much less spendy because Oracle will scan the index itself and ignore the underlying table unless it needs to grab additional info from the table to satisfy the query ...

If my assumption is correct that "select ... order by .. asc/desc limit 1" works on Oracle as well, should we make this a best practise for OpenACS to deal with min/max?