Forum OpenACS Q&A: SQL: Order by and Group by

Collapse
Posted by Bob OConnor on

I'm making modifications to the bboard system. I want to present a list of message threads with the most recent reply on top. So here is the situation: In a message thread, there is a new msg_id for each new message. If it is part of a thread, each root_msg_id is the same as the first message (the question).

msg_id   root_msg_id  subject         date
-----------------------------------------------   
000001     000001     The question    Oct  1
000004     000001     First Reply     Oct  5
000029     000001     A later Reply   Nov 15
-----------------------------------------------   

I want to keep this thread together yet sort my list so that this message would be be on top because the most recent reply was today. I can do an

order by posting_date desc, root_msg_id
But this doesn't work because the thread isn't together. I need only one message from a thread to post to my list. Another approach would be to select the messages by posting date and somehow ignore or "de-select" earlier messages in the thread that come up later in the select. I can't figure out how to ignore these earlier messages. The Group by command doesn't work because everything has to be the same in the group which means I can't use posting_date and other fields....

What I really want is a SQL statement that gets ALL the recent messages, but only ONE message, the most recent one from a thread and reverse orders them with the most recent one first.

Suggestions appreciated.
TIA
-Bob

Collapse
Posted by Todd Gillespie on
I'm kind of fried today, but a quick fix would be to spread this out over 2 queries & a little munging in the Tcl layer:
set selection [ns_db select $db "select distinct root_msg_id from bboard"]
foreach rmi [blahblah $selection] {
ns_write [database_to_tcl_string $db "
   select * from (
     select message from bboard where root_msg_id = $rmi 
        order by posting_date desc
     ) where rownum = 1"
   ]
}
or whatever the pgSQL variant of rownum is
Did you already go over something like this?
You could probably combine the queries, but, as I said, I'm not all here today. I'm also a little unclear on your requirement here - could you elaborate on your final paragraph? ALL & ONE are sort of opposite in my mind; also 'them' I'm a little unclear on - is 'them' all the posts in a thread, or do you want to order the set of most recent posts?

Collapse
Posted by Phil McTimoney on
Try this:
select root_msg_id, msg_id, posting_time
from   bboard
where  msg_id in
      (select max(msg_id) from bboard group by root_msg_id)
order  by posting_time;
My bboard table looks like this...

select root_msg_id, msg_id, posting_time
from   bboard
order by root_msg_id;

 root_msg_id | msg_id |      posting_time      
-------------+--------+------------------------
 000132      | 000132 | 2000-10-31 13:35:04+00
 000133      | 000133 | 2000-10-31 18:24:20+00
 000135      | 000135 | 2000-11-03 11:51:43+00
 000136      | 000136 | 2000-11-03 11:54:27+00
 00013D      | 00013N | 2000-11-16 12:22:29+00
 00013D      | 00013D | 2000-11-08 13:23:01+00
 00013F      | 00013M | 2000-11-09 11:35:11+00
 00013F      | 00013L | 2000-11-08 14:06:44+00
 00013F      | 00013I | 2000-11-08 13:52:41+00
 00013F      | 00013F | 2000-11-08 13:27:52+00
 00013G      | 00013J | 2000-11-08 13:57:05+00
 00013G      | 00013G | 2000-11-08 13:33:03+00
 00013H      | 00013K | 2000-11-08 14:03:06+00
 00013H      | 00013H | 2000-11-08 13:35:32+00
(14 rows)
And that command gives me the following results...
 root_msg_id | msg_id |      posting_time     
-------------+--------+-----------------------
 000132      | 000132 | 2000-10-31 13:35:04+00
 000133      | 000133 | 2000-10-31 18:24:20+00
 000135      | 000135 | 2000-11-03 11:51:43+00
 000136      | 000136 | 2000-11-03 11:54:27+00
 00013G      | 00013J | 2000-11-08 13:57:05+00
 00013H      | 00013K | 2000-11-08 14:03:06+00
 00013F      | 00013M | 2000-11-09 11:35:11+00
 00013D      | 00013N | 2000-11-16 12:22:29+00
(8 rows)
There may be better solutions...I'm used to using Sybase which has some non-standard behavior in terms of groups by's and having's.

Cheers,

Collapse
Posted by Bob OConnor on

Thank you Todd and Phil. "Use the source, Bob". So I did, and found part of my solution in the q-and-a-new-answers.tcl file. They use a view of the bboard table and join it to bboard:

View "bboard_new_answers_helper"
  Attribute   |   Type    | Modifier 
--------------+-----------+----------
 root_msg_id  | char(6)   | 
 topic_id     | integer   | 
 posting_time | timestamp | 

View definition: SELECT b.root_msg_id, b.topic_id, 
b.posting_time FROM bboard b WHERE (b.refers_to NOTNULL);

And the select:

select bnah.root_msg_id,count(*) as n_new,max(bnah.posting_time) 
as max_posting_time, to_char(max(bnah.posting_time),'YYYY-MM-DD') 
as max_posting_date, bboard.one_line as subject_line, 
bboard.user_id as poster_id
from bboard_new_answers_helper as bnah, bboard
where sysdate()::date - bnah.posting_time::date < 7
and bnah.root_msg_id = bboard.msg_id
and bnah.topic_id >0
group by bnah.root_msg_id, bboard.one_line, bboard.user_id
order by max_posting_time desc;

This does most of what I need. It seems to allow other fields in the select that are not part of the group by. I'll make a few modifications and do more testing so that I understand it.

-Bob

Collapse
Posted by Don Baccus on
The other fields are aggregates (max, in this case; count's another common one).  The columns that are selected when "group by" is used must either appear in the "group by" or in a call to an aggregate function.

This makes a lot of sense if you consider how "group by" works.  If you select a column that's not in the "group by", and there's more than one row in that grow, which column value should be returned?  The  aggregate's OK because it's defined to operate on all of the rows within a particular group, returning a single value.