Forum OpenACS Development: Joining to acs_objects and package-aware packages

The current documentation says:
Try not to write queries in your application that join against acs_objects. This means you should never use the fields in acs_objects for application-specific purposes. This is especially true for the context_id field.
Why not? I thought that was somewhat the point - consistent meta-data for all ACS Objects. Why not use acs_objects for creation_user and context_id? If not, what is the right way to store and retrieve the object owner and the context_id?

Also, what-all is entailed in making a "package-aware" package? That is, a package that can be installed into multiple points in a site map and have the data for each instance be seperate? Is it filtering by package_id for each database call? Is there more? Is "package-aware" the right label?

(P.S. Is this forum supposed to be about development of OpenACS or development with OpenACS?)

Collapse
Posted by Peter Marklund on
Joel,

"Try not to write queries in your application that join against acs_objects."

Remove this sentence as it is misleading. I think the key sentences in the document that conveys what they are getting at is:

"Second, in order for this to work, the various parts of the OpenACS Objects data model must be interpreted in the same way by all applications that use the data model. Therefore, assigning any application-specific semantics to any part of the core data model is a bad thing to do, because then the semantics of the data model are no longer independent of the application."

I think what spurred this warning was that applications were using context_id to indicate object context (or navigation) hierarchy rather than merely permission inheritance.

My vote is to take out the last bullet point altogether. The issue with the context_id is not important enough to warrant being in the summary. Or, if it is to be in the summary, make the bullet say something like:

"Don't rely on the context_id column in the acs_objects table to signify the hierarchy of objects. The column is merely intended for inheritance of permissions"

I happen to be one of those people who think that the context_id column should in fact indicate *both* the permission inheritance *and* the object hierarchy. Sometimes when package developers misuse the core it's an indication that the core should change. I'm not sure exactly what needs to change though to enforce such new semantics of the column. Anyhow, until we have made that change, the documentation should reflect the old state of affairs.

Regarding this forum my understanding is that it is for the development *of* OpenACS. I think that includes package development and not only core development.

Collapse
Posted by Dave Bauer on
Peter,

Itry would be nice to have something to organize the object hierachry. I am not sure we want to have it tied to the permissions hierarchy. I think that the objects themselves ma intain their hierarchy in the object specific storage. For example, content_repository and parent_id. Sometimes objects do not have context_id set also.

Let's remember that the toolkit wasn't finished we OpenACS inherited it, so perhaps this issue was never addressed. So we'll have to figure this one out for ourselves.

Joel,

Making a package safe to install multiple instances mainly requires making sure each instance only manages objects that belong to it, using a package_id column. One other thing is if the package uses content folders, generally each instance should have it's own folder.

Collapse
Posted by Don Baccus on
I think Peter's observation is correct ...

Now ... I've been doing some thinking about the context_id and object hierarchy issue over the last couple months, in particular when I did the permissions denormalization work for 4.6.1.

We don't really need context_id at all in objects, nor do we need security_inherit_p, because these values are denormalized in the context_object_hierarchy map which is managed by triggers on the objects table.  This denormalization was present in ACS 4.2.

My thinking has been ... why not recognize the fact that this denormalization is set in cement, because permissions won't scale without it (even with my changes to other parts of the datamodel)?  Then provide explicit API calls to flip security_inherit_p or to change the permissions context of an object?

We'd rid acs_objects of two unneeded columns (unneeded because of the denormalization) and forcing code to use an API rather than bash objects directly would make it relatively easy to enforce a convention that we do it via Tcl API ... which leads us further down the path towards a permissions API which is safe for caching ...

Then let's put in a real parent_id column - removing the then redundant parent_id column from content_item.

I think we could get away with setting the new parent_id column to the current context_id value (except for content_items which would get the existing parent_id.)  None-content objects are already frequently using this field for that purpose, and if code isn't doing so it wouldn't be hurt by it because it would be "blind" to context_id.

This is a fairly radical change but long-term would remove the temptation to use context_id as a parent_id without adding any additional bytes to an object.  And future packages would benefit from having an unambigious column available for recording object parent-child relationships.

I also think we should put package_id in basic objects.  My objections to some of the extra columns proposed by Dirk Gomez don't extend to package_id as tracking content by package is a requirement for subsite-aware packages.

Collapse
Posted by Peter Marklund on
Don,
I think your suggestion is outstanding! I very much like the idea of adding a parent_id column to acs_objects and having a package_id column there as well would be a very useful denormalization to package developers.

Concerning the acs object name and description debate I've started erring on the side of having such columns (or at least the name column). Lists with objects of different object types are sometimes useful, the prime example being site-wide search.  For such lists you must retrieve the names of the objects in a scalable fashion. I can only see that the current OpenACS datamodel fulfills such a use case if either of the following assumptions hold:

1) The acs_object.name function is performant enough to be used for listings of objects of mixed object types.

2) We intend to store all user submitted content in the CR (the CR has a name column for all content) regardless of whether that content needs versioning.

I am not convinced that either of the above assumptions hold. Unless I'm missing something my conclusion remains that a name column is a good idea.

Collapse
Posted by Joel Aufrecht on
Since I haven't re-written the documentation for permissions or the object model or content repository, I don't understand them yet. (Yes, my dirty secret is that I'm only doing the documentation because the only way I can understand something is to document it :P) Could you guys please explain the ramifications of these changes at the basic development level? Specifically:
  • I create a notes table. Each record in the notes table is also an ACS object. In which field in which table should I store the note owner? The package instance the note is in?
  • I create a new permission for notes, e.g. "generate-email-alert". How do I indicate who does and does not have this permission for this note?
  • What is this "denormalization?"
Collapse
Posted by Barry Books on
I think the sentence should be removed also. I remember reading it, taking it at face value and putting creation dates in other tables. Now I just join against it and I don't see any problems.

I'd also like to see context_id and inherit_p gone also. They just take up space and replacing them with parent_id would also solve the problem of using context_id as a parent_id. I'd also like to see object_type replaced by object_type_id. The varchar(100) at the front of the table is also a real space hog.

From what I've seen the name function is not useful in a query that returns multiple rows. It's way to slow.

I created a subsite aware object_type and inherit from it if I need a subsite aware object. Seems to work ok but it's another join.

The real question is should all this stuff be in acs_objects making the table bigger, but the queries simpler or have a bunch of related tables meaning more joins but perhaps more tunable. I like the lots of tables approach because if you really use the object model you can add attributes to acs_objects but put them in another table and use get_attribute to retrive values on queries that return one row, but still have flexiblity to tune multirow queries. You also don't have to worry about messing up something that does select * from acs_objects

Collapse
Posted by Ola Hansson on
Joel,

I will give it a shot and try to answer your three points above.

- The id of the user, or prehaps better, the party that owns the note should be recorded in a "owner_id" column in the notes table if you want to be able to change the assigned user. OTOH, if ownership doesn't ever need to be passed to anyone else but the user who created the note, it is probably okay to just join against acs_objects if you need to show that information.

- What you create is a "privilege". It's not entirely clear to me what the distinction between a privilege and a permission is ... I would like to express it as: "You grant the permission to exercise privilege X on object Y to party Z." However, that may be a bad interpretation and is certainly beside the point ...

The /permissions/ page is one place you can grant permissions. You may also do it directly from PL[PG]SQL when you create an object, for instance. Just use "acs_permission__grant_permission".

- An example of denormalization would be if one (as Barry proposes) would move the "object_type" column out of the acs_objects table and maintain the various object_types in a separate table (coincidentally such a table already exists) with (say) object_type_id as primary key. The removed column would be replaced by a "object_type_id" column pointing to the separate table.

At least this is my understanding of "denormalization". Someone please correct me if I'm wrong.

/Ola

Collapse
Posted by Tom Jackson on

Don,

I almost always use context_id to indicate which permission should be used for an object. How is this denormalized without this information in acs_objects? I guess what I am asking is: how would I need to change my application if context_id is removed from acs_objects? It seems like such a simple and convenient way of assigning permissions.

For instance, I recently created a simple accounting package for maintaining a General Ledger. Each set of books was called an accounting instance, with an instance_id (references acs_objects). I would assign permissions on this instance_id as required. Each instance would have a set of accounts, with account_id. These accounts would have the context_id set to the instance_id.

Collapse
Posted by Jim Lynch on
Hi,
The current documentation says:
Try not to write queries in your application that join against acs_objects. This means you should never use the fields in acs_objects for application-specific purposes. This is especially true for the context_id field.
Why not?
One reason to not join with acs_objects that hasn't been mentioned is that it might be a performance hit given so many joins to that table are already going on. So the wisdom here, is "leave it alone if you can".

Also, it's critical to separate the semantics of the core from the semantica of the app, because if you don't, then your application breaks when the core changes. Might that happen anyway? Sure. But you protect yourself better if you apply the software engineering concepts of information hiding and factoring to your code.

If you allow the core to hide its information and redundantly store it yourself, what you get is much more freedom from breakage, because you have separated the issues of your app from being affected by issues of the core.

If you have your app's procs and other entities factored properly, then even if you do experience breakage, you don't have as much work to do to unbreak it, since the breakage is likely factored into only one place.

Collapse
Posted by Joel Aufrecht on
This is something that's causing me some conceptual difficulty. The problem is this: ACS introduces a fair amount of overhead, under the rubric of "doing things right the first time." But then a lot of the core system is incomplete, and so, in the worst-case scenario, you end up spending a lot of time and effort for nothing. A lot of people are doing very good work to finish packages and correct semi-implemented architecture; in the meantime, I'm trying to figure out how to get the best tradeoffs for my own work, and how to document them for others to make informed tradeoffs. And this acs_core stuff is a key point.

Making some of the records in your new package acs_objects takes a lot of effort. A bunch of extra code in the database creation scripts, and then many relational integrity links that can make adding and deleting very frustrating if anything goes wrong. What do you get back in return? Access to a fairly clean, simple, and powerful permissions API. And a consistent way to store and access meta-data. What meta-data? Name, owner, creation and modified date, creation ip.

Okay, now how do you access this meta-data? One way is to join, but there is an argument that joining directly breaks information hiding. Another way is to call a function, but as far as I know a function only exists for name, and I've read posts that say it's to inefficient for big queries.

So my questions are:

What's the best way to get to this meta-data now? If there are several, what's the quick rule of thumb to decide which is appropriate?

What architectural stuff is proposed or already happening to change this?

URL seems like another good piece of data to have universally available. I see another thread about how to build this efficiently. How can we move forward on this?

Collapse
Posted by Jim Lynch on
Part of the answer lies in determining what purpose an individual query has in gaining access to the metadata in acs_objects.

If you are going to assume some non-standard interpretation of an acs_objects field, then don't join, store yourself instead.

On the other hand, if you -strictly- interpret fields in acs_objects according to the original meaning and intent, then a join is ok. This means avoiding application-specific interpretations entirely.

This is going to hold true so long as you believe your interpretation of acs_objects field values will continue to hold over the life of the app.

Many people are advocating quick changes to the data model, a surefire way to keep the core unstable. On top of that, interpretations on fields in acs_objects, once correct assumptions, could become package breakers, as a result of changes in how data in core tables is expected to be interpreted.

Lack of maintained documentation will extend that effect over the entire time docs continue to be ignored: if you can't find out the correct way to interpret fields in tables you don't create, how can you build anything that uses those tables and have a reasonable expectation it will work?

What I've said for the acs_objects table holds true for all other tables in the core kernel, and really for any table that you did not create.

Collapse
Posted by Carl Coryell-Martin on
Ola,
your example is a case of normalization, at the simpliest level of interpretation, information is normalized when each piece of information exists in exactly one place.  In your example, it is a normalization because you are pulling the repeated object_type name out and replacing it with a pointer to the object type table.

cheers,
carl

Collapse
Posted by Ola Hansson on
It doesn't surprise me the least that I had it all mixed up. :-b

Sorry folks!

Thanks for correcting me, Carl.