Forum OpenACS Q&A: an SQL question - urgent!

Collapse
Posted by Ravi Gadad on
this is a problem i need to work out within the next few hours, so i hope SOMEONE has some advice for me! i have a content_items table, a content_items_open_map table (just like the site_wide_category_map table, only this maps content_items to other tables, including categories), and a category_hierarchy table. here's the sql i'm querying with:

select distinct on (content_item_id) ci.content_item_id, 
ci.headline, ci.source_id, ci.summary_a, ci.summary_b, 
ci.summary_c, ci.review_score,
case when sysdate() between push_date and pull_date then 
push_weight + mapping_weight
else mapping_weight end as mapping_weight,
case when ch.child_category_id = 1 then 0 else 1 end as 
distance
from content_item_open_map ciom, content_items ci, 
category_hierarchy ch
where ciom.on_which_table = 'categories' and ciom.on_what_id 
= ch.child_category_id and (ch.parent_category_id = 1 or 
ch.child_category_id = 1)
and push_weight > 0 and ci.active_p = 't'
and ci.content_item_id = ciom.content_item_id
order by content_item_id, distance asc, mapping_weight desc, 
release_date desc, content_item_id limit 10;

in other words, i'm fetching all the content items that are mapped to a category OR its child categories, and trying to sort them by their mapping_weight (push_weight is just an addition to mapping weight) and distance (either 0 or 1) from the category specified (in this case, category_id 1). the problem is, i don't want duplicates, so i've selected with DISTINCT ON content_item_id. i can't do a select like this, however, without specifying content_item_id as the first argument to ORDER BY, which completely ruins my sorting.

anyone have any ideas? thank you in advance for your help!

Collapse
Posted by Don Baccus on
Why do you need to use "distinct on" rather than just "distinct"?

You'll return one row per content_item in each case, won't you?

Collapse
Posted by Ravi Gadad on
why i can't use distinct instead of distinct on -

if i use distinct, i won't get duplicate *rows.* however, there are content_items that are mapped to categories AND their subcategories, with different weights and different push_dates, pull_dates, etc. therefore, i'm obviously not going to have duplicate rows, since i can end up with different distances and mapping_weights for a given content_item.

i therefore need a way of eliminating duplicate content_ids in the content_id column, not just eliminating duplicate rows.

(i hope that made more sense.) :)

Collapse
Posted by Don Baccus on
Well, you have this many-to-one relationship that you're trying to map
on single rows created by "distinct on (content_item_id)".

Which push_weight do you expect the case statement to operate on for
that single row, if there are more than one category/subcategory
entries mapped to that row?

"group by" won't let you do this, it will force you to use an
aggregate function (count(), sum(), avg() etc) to collapse the
multiple push_weights into a single value.  You *have* to collapse the
multiple values somehow...

What do you plan to do with the push_weight values?  I assume this is
for a report or web page, right?  What do you plan to output?  That
might help me give you some more advice.

Collapse
Posted by Ravi Gadad on
don - thanks for looking into my problem.

first of all, everything in the case statement is obviously operating
on any row it queries and returns back to me - which seems
logical enough to me.  it seems to be working, in that regard - i
get one row for each content_item_id, and the distance and map
values are correct.  so the query is working fine, as it is, except
that the rows are only ordered by distance, map, or release date
within a given content_item_id. (i realize that i had two
content_item_id's in the order by clause, and i took the last one
out - it worked both ways, of course).  so i'm getting the RIGHT
rows, just not in the right overall order.

what i'm trying to do is this:  i'm displaying a listing of content
items to the user.  i know what category he's in, so i want to
display items of that category to him, and secondarily items of
the subcategories, if there aren't enough in the category itself.  i
need to display them in order of their importance in the given
category, which is optionally bolstered by a "push" that we enact
on certain important items.  items of the same mapping_weight
should be displayed in reverse chronological order, so if all
items are the same weight, the newest ones come out on top.

so far, i've been advised to encapsulate the entire query in the
from of another clause, which would be slow but it would work -
except that i'm using postgresql.

any other advice?

Collapse
Posted by Don Baccus on
Well, subselect in "from" was my first thought, too, but unlike Karl
Goldstein I knew that a) you were using Postgres and b) PG 7.0 doesn't
support subselects in "from" (PG 7.1 does, BTW, but it is pickier
about some error stuff that means that OpenACS doesn't run right out
of the box, our problem, not PG's).

So I didn't bother suggesting that solution.  Nor did I suggest
building a view which returns the distinct rows, because PG 7.0
doesn't support "distinct" in views, either.

I'll look again at this tomorrow...do you have a web page we can look
at that shows the proper values unordered?