Is there a way you can get distinct rows in the first query without actually using distinct?
Yes, Don mentioned a way above, the query I used was a nasty mess, I cleaned it up and got it to work, but I wanted to learn what was going on and learning I am!
does the count(*) improve after you run it a second time, i.e. after presumably the data is already in memory?
No it does not seem to improve. Overall on this system when I run queries they do not seem to be any faster the next time around. It seemed to improve a bit with a higher value for the -B setting for PG.
Have you done a "vacuum analyze"?
Yep did that, do it every night too.
Do you have indices created on functions?
No indices on this table yet, the table does not have a primary key yet. This query is step 1 in fixing that problem. I am not sure how one would place an index on a function.
Why can't you just do this?
select count(*), sid, membid, date, time
from logs
group by sid, membid, date, time
having count(*) > 1
I did do that and it ran, I was just very curious as to why it failed. It took a long time, about 3 minutes to run the query above but that is fine.
The whole reason I had to run this query is because this table has no primary key and I needed to figure out how to select every distinct row. Winds up there is no way to distiguish a row, there are massive amounts of duplicate values, so time for me to fix that problem.
This is the joy of adopting an ill designed table, now I get to clean it up.
I never realized how poorly configured this box was for PG. I have done what Don suggested and moved the shared memory buffer up to 3000 * 8192, 24mb. Evidently the kernel for this box is only set to handle 24mb or so as a max. I really want to learn more about this, it seems very important. Now that I am getting more and more into PG the more I want to understand it. Where can I get the in depth details on the -S and -B swicthes for PG and the linux kernel settings? I need some resources on what these different memory areas are for etc.
Thanks for the advice, it was great.
-Jeff