Forum OpenACS Q&A: OpenACS 4 production sites

Collapse
Posted by Jun Yamog on
Hi,

We have been running OpenACS 4 in some production sites.  Just would
like to get some opinion and give some feedback to the community.

The good

- Even at pre alpha the sites are still running well.

- Its proves that 4.x framework is productive too like 3.x framework.

The bad

- acs_objects is now at 40,000 rows in one of the sites.  Even with
purging of around 10K rows every now and then... its still big.

- production sites are running different code base.  We can't do an
update of the code and data model everyday.

I would like to discuss more of the acs_objects issue.  Is this really
an issue?  We have never pushed on any of our older sites a table that
grows around 10K per month.  A single table that grows 10K a month?!?
Are we safe here or what?  =)  So far the site is still humming ok,
hardware is taking a breeze.  When would you think we will some real
bad db impact on this? 500K mark?  If there is a significant db
performance impact do we still go with 4.x framework (central object
like data model) or do we step a little back and change some stuff
more like 3.x (separete data model)?

Other people who is running production OpenACS 4 sites please give
some info on their experiences.

Collapse
Posted by defunct defunct on
Not much help this, but in the past I have done some load testing on OACS 4 (earlier in its develpoment life) cos I had the same query.

It would appear (and I am not a PG expert) that given then unqiue keys in that table, the fact that most references to the table are against direct key values, and that the majority of accesses to that table appear to be inserts, I don't think its too much of an issue. Certainly Oracle doens't seem to experience significant degredation from high numbers.

I'm sure someone else here can be more definitive and will be better informed about PG.

Collapse
Posted by Don Baccus on
Where PG might degrade worse than Oracle would be with deletes and updates to the objects table.  But these are rare in practice, and inserts should be fine.

I don't personally worry about access times with relatively large tables as long as queries are optimized correctly and indexes used.  Remember that access is O(log2(N)) where N is the number of rows.  4 billion rows will require 32 "pokes" at the index to find a row, in other words, not so bad.  That's only twice as slow as the number of index "pokes" needed to find a row in a table with 65,536 rows.

The problem we could run into is that some queries in the system might be generating plans that force sequential table scans on acs_objects, and obviously that would be *very* bad.  If we run into such queries clearly we need to fix them, though.  This would be true even if we didn't make such heavy use of acs_objects.

Collapse
Posted by Jun Yamog on
Thanks Don,

You are a real geek.  So basically you use binary search of some form, silly me never though of it that way.

So I guess we just have to make sure that we use indexes if possible.  Then on our queries or joins we use the indexed columns.

Collapse
Posted by Don Baccus on
Yep, that's the thing to do!

Don't go index-crazy because maintaining on update/insert can become expensive.  But make enough to keep things running fast.

Collapse
Posted by Janne Blomqvist on
To be really pedantic about the indexing estimates... pgSQL uses B-trees as default. I don't know how many-way the trees actually are (i.e.
how many pointers do the nodes in the tree at most contain) but considering that most unix filesystems seem to use 4 Kilobyte (or should I
say Kibibyte to be use the official NIST definition...:) ) blocksize as default (i.e. the smallest amount of data that the filesystem will read at a
time) one might assume that M is quite big. If the pointers are stored as 4-byte integers one gets that M could be approx 1024, minus some
overhead. To be really safe let's say M=512. Then the number of disk accesses to find a row in a table of 4 billion records is not
log2(4e9)=32 as Don suggested but rather logM(4e9)=4!! Nice, huh? Of course, I'm no super-hacker and certainly no pgSQL-guru, so I
might just be
rambling...(as usual)...😊
Collapse
Posted by Don Baccus on
OK, I wasn't really speaking on disk access terms but rather "pokes" at the structure itself.  Hopefully we've got the entire index cached in RAM :)

In practice there's some overhead in the index structure, you need to differentiate between a leaf and a branch and other stuff.  I'm not quite sure how big the entries are but it is bigger than 4 bytes.

On the other hand the overhead for tuples in the table itself is a lot higher than metadata overhead in an index in PG's case, so the penalty for a sequential scan is worse than you might imagine ...

Collapse
Posted by Ken Kennedy on
but rather logM(4e9)=4!! Nice, huh?

Are you mad, man? 4!!=24!=620448401733239439360000. No way THAT's nice...*grin*. (It's a joke, ya see...)

Ah...math humor. And I'm assuming it's right...factorials of factorials parse that way. Anyone know for sure? (I'm too lazy to track down a reference, and I'm only about 80% sure). The funny thing is...I actually read it that way at first...I'm thinking "man, the factorial of four factorial...that's pretty big...oh *smack* those are exclamation points!"

Just some Friday night humor, folks.

Collapse
Posted by Hamilton Chua on
Thanks Jun for bringing this to the forums. Let me add some more specifics.

We're using OpenACS pre-Alpha on postgres 7.1.3.

The OpenACS 4 production site that I handle in particular uses
ACS-Message extensively and most of the 40,000 are actually acs_message objects.

Along the road to reaching the 40,000 + records on acs_objects we
noticed that restarting the server slows down on two querries ( I will
try to post them later ) even with regular nightly vacuuming.

Also, I have just recently found out that the Package Manager page takes a very long time to load.

The rest of the site is running splendidly except for the above two
observations.

Jun mentions that some of the above may have already been resolved in the newer CVS / tarballs of OpenACS but we are not 100% sure.

Many Thanks....

Collapse
Posted by Don Baccus on
Yeah, post the queries, please.  Also do an EXPLAIN on the queries and post the plan, too.  Nicely formatted with "pre" tags if you don't mind, makes it a lot easier to read!
Collapse
Posted by Hamilton Chua on

As requested, here are the two queries. Each query is followed by Explain Results.

Query :

select 1 as admin_exists_p
from dual
where exists (select 1
	      from acs_object_party_privilege_map m, users u
	      where m.object_id = 0
	      and m.party_id = u.user_id
	      and m.privilege = 'admin');

Explain Results:

NOTICE:  QUERY PLAN:

Result  (cost=0.00..0.00 rows=0 width=0)
  InitPlan
    ->  Nested Loop  (cost=811562.64..825535.26 rows=19 width=8)
          ->  Subquery Scan m  (cost=811562.64..825497.55 rows=185799 width=100)
                ->  Unique  (cost=811562.64..825497.55 rows=185799 width=100)
                      ->  Sort  (cost=811562.64..811562.64 rows=1857988 width=100)
                            ->  Append  (cost=8510.41..63616.72 rows=1857988 width=100)
                                  ->  Subquery Scan *SELECT* 1  (cost=8510.41..10142.16 rows=234019 width=68)
                                        ->  Merge Join  (cost=8510.41..10142.16 rows=234019 width=68)
                                              ->  Nested Loop  (cost=0.00..1486.94 rows=441 width=24)
                                                    ->  Index Scan using acs_privileges_pk on acs_privileges p1  (cost=0.00..6.23 rows=21 width=12)
                                                    ->  Seq Scan on acs_privileges p2  (cost=0.00..1.21 rows=21 width=12)
                                                    SubPlan
                                                      ->  Nested Loop  (cost=0.00..3.29 rows=1 width=36)
                                                            ->  Seq Scan on acs_privilege_hierarchy_index h1  (cost=0.00..1.64 rows=1 width=12)
                                                            ->  Seq Scan on acs_privilege_hierarchy_index h2  (cost=0.00..1.64 rows=1 width=24)
                                              ->  Sort  (cost=8510.41..8510.41 rows=11144 width=44)
                                                    ->  Hash Join  (cost=1025.15..7574.53 rows=11144 width=44)
                                                          ->  Seq Scan on acs_object_context_index  (cost=0.00..1857.48 rows=116948 width=8)
                                                          ->  Hash  (cost=1022.36..1022.36 rows=1118 width=36)
                                                                ->  Merge Join  (cost=715.46..1022.36 rows=1118 width=36)
                                                                      ->  Index Scan using acs_permissions_grantee_idx on acs_permissions p  (cost=0.00..224.89 rows=4702 width=20)
                                                                      ->  Sort  (cost=715.46..715.46 rows=1859 width=16)
                                                                            ->  Hash Join  (cost=253.93..614.52 rows=1859 width=16)
                                                                                  ->  Seq Scan on membership_rels mr  (cost=0.00..88.05 rows=4644 width=4)
                                                                                  ->  Hash  (cost=165.81..165.81 rows=6942 width=12)
                                                                                        ->  Seq Scan on group_element_index  (cost=0.00..165.81 rows=6942 width=12)
                                  ->  Subquery Scan *SELECT* 2  (cost=10920.40..12501.40 rows=148755 width=100)
                                        ->  Merge Join  (cost=10920.40..12501.40 rows=148755 width=100)
                                              ->  Nested Loop  (cost=0.00..1486.94 rows=441 width=24)
                                                    ->  Index Scan using acs_privileges_pk on acs_privileges p1  (cost=0.00..6.23 rows=21 width=12)
                                                    ->  Seq Scan on acs_privileges p2  (cost=0.00..1.21 rows=21 width=12)
                                                    SubPlan
                                                      ->  Nested Loop  (cost=0.00..3.29 rows=1 width=36)
                                                            ->  Seq Scan on acs_privilege_hierarchy_index h1  (cost=0.00..1.64 rows=1 width=12)
                                                            ->  Seq Scan on acs_privilege_hierarchy_index h2  (cost=0.00..1.64
                                              ->  Sort  (cost=10920.40..10920.40 rows=7084 width=76)
                                                    ->  Hash Join  (cost=3748.45..10292.74 rows=7084 width=76)
                                                          ->  Seq Scan on acs_object_context_index  (cost=0.00..1857.48 rows=116948 width=8)
                                                          ->  Hash  (cost=3746.67..3746.67 rows=711 width=68)
                                                                ->  Hash Join  (cost=296.97..3746.67 rows=711 width=68)
                                                                      ->  Hash Join  (cost=99.66..3443.33 rows=1182 width=48)
                                                                            ->  Nested Loop  (cost=0.00..2998.79 rows=4413 width=44)
                                                                                  ->  Seq Scan on rel_segments rs  (cost=0.00..1.04 rows=4 width=20)
                                                                                  ->  Index Scan using group_elem_idx_group_idx on group_element_index  (cost=0.00..113.85 rows=235 width=24)
                                                                                  SubPlan
                                                                                    ->  Nested Loop  (cost=0.00..5.39 rows=1 width=36)
                                                                                          ->  Index Scan using acs_object_types_pk on acs_object_types o1  (cost=0.00..2.01 rows=1 width=12)
                                                                                          ->  Seq Scan on acs_object_types o2  (cost=0.00..2.50 rows=50 width=24)
                                                                            ->  Hash  (cost=88.05..88.05 rows=4644 width=4)
                                                                                  ->  Seq Scan on membership_rels mr  (cost=0.00..88.05 rows=4644 width=4)
                                                                      ->  Hash  (cost=79.02..79.02 rows=4702 width=20)
                                                                            ->  Seq Scan on acs_permissions p  (cost=0.00..79.02 rows=4702 width=20)
                                  ->  Subquery Scan *SELECT* 3  (cost=10942.73..13020.91 rows=984035 width=52)
                                        ->  Merge Join  (cost=10942.73..13020.91 rows=984035 width=52)
                                              ->  Nested Loop  (cost=0.00..1486.94 rows=441 width=24)
                                                    ->  Index Scan using acs_privileges_pk on acs_privileges p1  (cost=0.00..6.23 rows=21 width=12)
                                                    ->  Seq Scan on acs_privileges p2  (cost=0.00..1.21 rows=21 width=12)
                                                    SubPlan
                                                      ->  Nested Loop  (cost=0.00..3.29 rows=1 width=36)
                                                            ->  Seq Scan on acs_privilege_hierarchy_index h1  (cost=0.00..1.64 rows=1 width=12)
                                                            ->  Seq Scan on acs_privilege_hierarchy_index h2  (cost=0.00..1.64 rows=1 width=24)
                                              ->  Sort  (cost=10942.73..10942.73 rows=46859 width=28)
                                                    ->  Hash Join  (cost=90.77..6684.95 rows=46859 width=28)
                                                          ->  Seq Scan on acs_object_context_index  (cost=0.00..1857.48 rows=116948 width=8)
                                                          ->  Hash  (cost=79.02..79.02 rows=4702 width=20)
                                                                ->  Seq Scan on acs_permissions p  (cost=0.00..79.02 rows=4702 width=20)
                                  ->  Subquery Scan *SELECT*  4  (cost=0.00..25059.59 rows=490971 width=52)
                                        ->  Nested Loop  (cost=0.00..25059.59 rows=490971 width=52)
                                              ->  Nested Loop  (cost=0.00..2892.65 rows=209 width=48)
                                                    ->  Nested Loop  (cost=0.00..2189.97 rows=10 width=36)
                                                          ->  Nested Loop  (cost=0.00..3.49 rows=1 width=28)
                                                                ->  Index Scan using acs_permissions_grantee_idx on acs_permissions p  (cost=0.00..2.01 rows=1 width=16)
                                                                ->  Seq Scan on acs_privileges p1  (cost=0.00..1.21 rows=21 width=12)
                                                          ->  Index Scan using acs_obj_ctx_idx_ancestor_idx on acs_object_context_index  (cost=0.00..2129.66 rows=4546 width=8)
                                                    ->  Seq Scan on acs_privileges p2  (cost=0.00..1.21 rows=21 width=12)
                                                    SubPlan
                                                      ->  Nested Loop  (cost=0.00..3.29 rows=1 width=36)
                                                            ->  Seq Scan on acs_privilege_hierarchy_index h1  (cost=0.00..1.64 rows=1 width=12)
                                                            ->  Seq Scan on acs_privilege_hierarchy_index h2  (cost=0.00..1.64 rows=1 width=24)
                                              ->  Seq Scan on users u  (cost=0.00..82.46 rows=2346 width=4)
                                  ->  Subquery Scan *SELECT* 5  (cost=0.00..2892.65 rows=209 width=48)
                                        ->  Nested Loop  (cost=0.00..2892.65 rows=209 width=48)
                                              ->  Nested Loop  (cost=0.00..2189.97 rows=10 width=36)
                                                    ->  Nested Loop  (cost=0.00..3.49 rows=1 width=28)
                                                          ->  Index Scan using acs_permissions_grantee_idx on acs_permissions p  (cost=0.00..2.01 rows=1 width=16)
                                                          ->  Seq Scan on acs_privileges p1  (cost=0.00..1.21 rows=21 width=12)
                                                    ->  Index Scan using acs_obj_ctx_idx_ancestor_idx on acs_object_context_index  (cost=0.00..2129.66 rows=4546 width=8)
                                              ->  Seq Scan on acs_privileges p2  (cost=0.00..1.21 rows=21 width=12)
                                              SubPlan
                                                ->  Nested Loop  (cost=0.00..3.29 rows=1 width=36)
                                                      ->  Seq Scan on acs_privilege_hierarchy_index h1  (cost=0.00..1.64 rows=1 width=12)
                                                      ->  Seq Scan on acs_privilege_hierarchy_index h2  (cost=0.00..1.64 rows=1 width=24)
          ->  Index Scan using users_pk on users u  (cost=0.00..2.02 rows=1 width=4)
  ->  Subquery Scan dual  (cost=0.00..0.00 rows=0 width=0)
        ->  Result  (cost=0.00..0.00 rows=0 width=0)

Query :

select site_node__url(n.node_id) as url, n.node_id, n.directory_p,
          n.pattern_p, n.object_id, o.object_type, n.package_key, n.package_id
from acs_objects o left outer join
        (select n.node_id, n.directory_p, n.pattern_p, n.object_id,
                p.package_key, p.package_id
           from site_nodes n, apm_packages p
          where n.object_id = p.package_id) n
         using (object_id)

Explain Results:

NOTICE:  QUERY PLAN:

Hash Join  (cost=4.09..4210.72 rows=45459 width=42)
  ->  Seq Scan on acs_objects o  (cost=0.00..1251.59 rows=45459 width=16)
  ->  Hash  (cost=4.05..4.05 rows=16 width=26)
        ->  Subquery Scan n  (cost=1.20..4.05 rows=16 width=26)
              ->  Hash Join  (cost=1.20..4.05 rows=16 width=26)
                    ->  Seq Scan on apm_packages p  (cost=0.00..1.33 rows=33 width=16)
                    ->  Hash  (cost=1.16..1.16 rows=16 width=10)
                          ->  Seq Scan on site_nodes n  (cost=0.00..1.16 rows=16 width=10)