Forum OpenACS Q&A: PostgreSQL nested loops processing

Collapse
Posted by Vlad Seryakov on
Hello, i have RADIUS accounting table with about 2 millions records. I am trying to run report and i cannot change nested loop order, PG selects to run on smaller table(carriernetworks) but i want to make it run once on the huge one(radacct) and join with smaller one.

SELECT username,
round(extract(epoch from SUM(acctsessiontime))/3600,2) as usage
FROM radacct a,
carriernetworks n
WHERE framedipaddress <> '' AND
acctstoptime BETWEEN '2002-09-30 0:0' AND '2002-10-30 0:0' AND
framedipaddress::TEXT::INET << n.network AND
n.carrier = 'Partner'
GROUP BY username
HAVING sum(acctsessiontime)/3600 > 190;


NOTICE: QUERY PLAN:

Aggregate (cost=200254241.62..200254435.04 rows=2579 width=62)
-> Group (cost=200254241.62..200254306.10 rows=25789 width=62)
-> Sort (cost=200254241.62..200254241.62 rows=25789 width=62)
-> Nested Loop (cost=200000000.00..200252352.01 rows=25789 width=62)
-> Seq Scan on carriernetworks n (cost=100000000.00..100000004.65 rows=67 width=12)
-> Index Scan using radacct_stoptime_idx on radacct a (cost=0.00..3754.83 rows=770 width=50)

Collapse
Posted by Lachlan Myers on
I think specifying the CROSS JOIN instead of the comma in the "FROM" will over-ride the optimiser.
SELECT 
...
FROM radacct a CROSS JOIN carriernetworks n 
...
Collapse
Posted by Vlad Seryakov on
thanks, it worked but it took even more time than using subselect:

With CROSS JOIN

Aggregate (cost=9997.76..10191.18 rows=2579 width=62) (actual time=1076455.95..1084354.97 rows=129 loops=1)
-> Group (cost=9997.76..10062.23 rows=25789 width=62) (actual time=1076259.64..1081871.40 rows=1164108 loops=1)
-> Sort (cost=9997.76..9997.76 rows=25789 width=62) (actual time=1076259.61..1078002.04 rows=1164108 loops=1)
-> Nested Loop (cost=0.00..8108.15 rows=25789 width=62) (actual time=66.99..1038315.91 rows=1164108 loops=1)
-> Index Scan using radacct_stoptime_idx on radacct a (cost=0.00..3754.83 rows=770 width=50) (actual time=37.60..72251.53 rows=1372192 loops=1)
-> Seq Scan on carriernetworks n (cost=0.00..4.65 rows=67 width=12) (actual time=0.20..0.46 rows=67 loops=1372192)
Total runtime: 1084431.59 msec

with subselect:
Aggregate (cost=3908.42..3911.31 rows=38 width=35) (actual time=821347.39..829198.37 rows=131 loops=1)
-> Group (cost=3908.42..3909.38 rows=385 width=35) (actual time=821199.26..826744.54 rows=1174474 loops=1)
-> Sort (cost=3908.42..3908.42 rows=385 width=35) (actual time=821199.24..822917.79 rows=1174474 loops=1)
-> Index Scan using radacct_stoptime_idx on radacct a (cost=0.00..3891.89 rows=385 width=35) (actual time=8.50..782906.73 rows=1174474 loops=1)
SubPlan
-> Seq Scan on carriernetworks n (cost=0.00..5.71 rows=34 width=0) (actual time=0.51..0.51 rows=1 loops=1384735)
Total runtime: 829282.18 msec

Collapse
Posted by Andrei Popov on
How about removing a <>? I remember reading somewhere that in SQL "negative" statements (e.g. NOT NULL, NOT IN, <>) are sever performance drain -- it is easier to pick things that are equal, than not...