Forum OpenACS Development: Overly use of ACS Objects - Consequences?

At the moment we have some heated discussions in our company about the overly use of ACS Objects and possible angles how to reduce this. This came up due to our anticipations about two sites we are working on with a lot of users (500K) and/or communities (3000).

Major culprits identified by the "we want change" group has been the permission system (aka. acs-rels) and the portal system (if I remember correctly, it is something in the magnitude 5*user/community relationships number of objects. Assuming we have 40000 user with an average of 5 memberships in a community this would be 1 Million entries in ACS objects).

My interest now is: Do we even have to bother? Is it a problem to have 3-5 Million entries in ACS objects, regarding performance. And if we do (e.g. quickly hack the permissioning system to use it's own tables), what would be the consequences for staying compatible with the Main development of OpenACS (as we do have the sites running, we need to write upgrade scripts anyway, so no worry there, Don :)).

And, last but not least, if we do split off some parts of OpenACS from using ACS objects, a couple of time the argument of functionality around acs_objects was mentioned to be a big goodie for using it. I might be naive here, but why not mirror the functionality that is needed in the other's package namespace?

Collapse
Posted by Barry Books on
I've got over 10 million acs_objects currently in a 4.2 Oracle system. I don't have that many users but I do have over 1000 site nodes. We've had a few performance problems, but they have all been resolved when they occured without changing the basic data model.

I think if you are expecting 5 million max then you don't have much to worry about. I'm expecting 50 million by the end of the year and the only real problem I have right now is the time required to import the object_context_index table. I expect to solve that by switching to rmon for backups.

The advantage of the ACS data model is not speed but flexibilty. If you are even considering ACS then you don't want to sacrifice that until you have to and I suspect that time will never come. Also don't underestimate the value of one sequence number accross all objects and tying everything together with rels. You can dig alot of information out of the database in very general ways. This is very useful for admin and debuging pages. The only drawback here is don't lock the acs_objects table.

So to answer your question if you are using ACS you can't over use ACS objects. The consequences of not using them are far greater than trying to over use them. If you are certain the data model will be too slow then you might want to consider a different toolkit (or a faster machine)

Collapse
Posted by Don Baccus on
The portals rewrite that's in CVS reduces the number of objects.  Reducing them further would reduce the flexibility and usefulness of the portal system.

The permission system has nothing to do with acs-rels, so I don't understand the question.

Relational segments and groups do use the acs-rels portion of the datamodel, and I'd agree that there is no need in most (but not all) cases for each row in a relationship to be an object.  I wouldn't object to a rewrite of those two features to use non-object private tables to store the party-[group/rel] relationships.  Groups and relsegs themselves must be objects so we can use permissions on them to determine who can add or delete members, etc.

If I'd been more ambitious when I added denormalized tables to speed permission checking I might've tackled this myself.  However relsegs and groups provide fairly sophisticated functionality and a wide variety of views so a rewrite (including upgrade scripts) might be somewhat ambitious.

An example of when the fact that every row in a relationship is an object, and therefore visible to permissions, would be a general ratings system.  Each rating would be a row in acs-rels, and permissions would govern who could edit or delete existing ratings.

So we don't want to toss out acs-rels entirely.

On the other hand, I don't see that the object cost of acs-rels as being a major concern.  Though I'd be open to a rewrite of relsegs and groups that doesn't use acs-rels but rather a private relationship system, I don't think there's much bang-for-the-buck to be gained here.

A million rows should be no big deal.  This is particularly true in Oracle (Postgres has a fairly high per-row overhead.)  Even in PG you're talking 200-300 MB max for this table.  Presumably if you're  going to support 500,000 users you understand that you need many GB RAM no matter what technology you choose, and that getting rid of acs-objects isn't going to make that need go away.

Now one thing you *can* do is to not join against the acs-objects table unless you actually need the information that is stored there.  It is joining not access per se that generates costs that build up.  Identifying queries that are slow and optimizing them by various means is where you'll most impact performance IMO.

Back to permissions ... there's evidence posted by Dirk Gomez that makes us think that adding a denormalized hierarchy table that exists in PG but not Oracle would speed performance.

Also getting rid of all these silly foo_read, foo_write private permissions that are children of read and write, and just using read and write directly would help permissions run faster.  I just did this recently for the port of the 3.x events package.