Forum OpenACS Improvement Proposals (TIPs): TIP #42 (Implemented) Adding package_id to acs_objects

Proposal

Adding package_id to acs_objects. See here

Reason

Almost all objects are being created by a package instance and thus belong to a package, but every package has it's very own way of storing the information on what objects belong to which instance. This information is sometimes very hard to obtain or in cases where this information is not available almost impossible to get through the context-hierarchy.

Storing the information about which package created an object in a central place would enable central services such as categories or search to list results with the name of the package instance that's able to display this object (in the rare case where multiple packages could deal with the same object it really doesn't hurt to show just the creator package).

Disadvantages

package_id is not available for some internal objects such as acs_rels, groups or users, therefore null values will have to be used for these types. Since the amount of these objects on a reasonable site is almost negligable I do think this is not such a great disadvantage.

Solution

All packages need to be changed to store the package_id of newly created objects in acs_objects (need extended interface for acs_object.new). Upgrade scripts need to be written. acs_objects need to be extended with the column "package_id" of type integer, referencing acs_packages, without (!!) a "not null" constraint.

Will do the core changes and the changes to news, forums, blogger, file-storage, mailing-lists, categories and someone else should do it on the other packages.

Approve
Collapse
Posted by Dave Bauer on
Should the type specific table drop package_id?
Collapse
Posted by Timo Hentschel on
short-term: no. mid-term: yes. this requires to go through all queries and is more work than is needed at first.
Collapse
Posted by Tom Jackson on

The idea of mapping objects to packages is good. Unfortunately what is proposed isn't a map. A map would allow zero, one or multiple associations of an object to various packages, which exactly corresponds to the possibilities. In the thread referenced above, Don said that only one 'package' could create an object, thus the meaning of the package_id is that it refers to the package that created the object. Is this going to be the meaning? Of course, anything created via a UI was created by a package. Is this going to be reflected in the acs_objects.package_id?

Collapse
Posted by Dave Bauer on
Tom, interesting questions.

I think the package_id to define where an object belongs almost works.

I like the content repository model of parent_id better.

This builds an explicit object hierarchy that is often, but not always reflected in acs_objects.context_id.

I would love to explore the possibilites of setting up an object hierarchy for all objects similar to the cr_item.item_id, cr_item.parent_id model. Unfortunately it seems like a huge amount of work and package_id would help with some queries where we want to know which package created an object.

I think in the long run, storing items in a folder tied to a site-node ends up being very useful for partitioning objects.

I guess this is all something to think about for the future, but doesn't decide this TIP.

It don't personally really care either way, but I'd like to see a clear explanation of using either a single package_id column in the acs_objects table (one-to-many mapping), vs. a mapping table to give a possibly many-to-many mapping.

Which is the better data model and why? And what are the precise semantics that OpenACS will ascribe to this field or mapping table? Note: Not insinuations of performance implications, which is the better and more useful model. Dirk's thread above from Oct-Nov 2003 never answered that.

Collapse
Posted by Timo Hentschel on
One of the main reasoning for this TIP is that it's needed (as i explicitly mentionend in the TIP) for search and categories since these are packages dealing with objects of unknown origin and there has to be a way of showing a list of arbitrary objects to the user (thus the need for an object-name at a central place) together with links to the objects (thus the need for package_id and a url derived from the object_id).

From my point of view all that a user wants after he found the objects of his desire through search or categories is that he wants to see the object. Personally i don't want to have the very same object listed a couple of times if there's more than one package that could deal with this particular object - i would want to see each object only once in the result list and if that means we show only the creator package and not also the editor packages then this would be perfectly fine with me. That is why i don't really see any need for a mapping table. But if the majority here sees it otherwise with a good reason (not convinced yet) to add yet another table to join with - then so be it.

Collapse
Posted by Dirk Gomez on
The thread remained unanswered because nobody had a compelling example for or against a mapping table (and because I had lost interest in this matter)

I am siding with Timo because for the problems that are supposed to be solved I cannot see where a one-to-many relationship is necessary: If OpenACS were more data-centric I could imagine a whole lot of good uses for multiple package_ids. We are currently application-centric and I don't see that this will change too quickly, so a mapping table would only complexify things. And complexifying code for a future promise often doesn't work.

Collapse
Posted by Jeff Davis on
First of all I vote yes.

On Tom's objection I think we already have a mapping table in the form of acs_rels where we would could do many to many maps if needed but for something we think has value to provide for most objects we would like not to add the overhead of creating an acs_rel object just to map to the owning package. Also even if we used a many to many table we generally would want a single "owning package". To enforce that constraint in the mapping table with schema like (package_id, obj_id, relation) would be expensive and less desirable than having a single field in the acs_object table.

Furthermore we have had acs_rels for some time and it is barely used so I don't see any compelling argument that we need add more complexity to support something we have not used to date.

Enough packages have package_id in their object specific store that I think it's sensible to denormalize it and make an effort to ensure that all packages maintain it.

Collapse
Posted by Tom Jackson on

If the package_id meant the creation package, then I can see that a single relationship would exist for each object. Possibly a 'system package' could be created and be the default? This would be used in the case where the creator wasn't known or if the object was created via some pl procedure outside of a package. For instance the package 'system package' would be created by itself. But if the package_id starts to mean different things to different developers, it isn't going to be very helpful in the long run.

In any case you also have to deal with the circular reference acs_object --> package --> acs_object.

Collapse
Posted by Lars Pind on
I won't object to this, if it means that we can provide an adequate interim solution, but I'm pretty certain this isn't going to be the long-term solution to this problem.

The way I see it, we need 3 things:

1. An inexpensive way to get a URL for an object. The /o/ trick is the best solution to this I've seen so far.

2. Where in a hierarchy of object this belongs. For a forums message, it would be the path to the forums package, but also include the forum. For a logger entry, it would include the logger package and the logger project.

In general, I want to move towards a data-centric design rather than a package-centric design, so objects don't belong to packages, but to any container - a folder, another item, etc.

3. If we want to present the item, we also need information that pertains to the particular object type, to get the name of the object and other relevant pieces of information. These other pieces of information may even depend on the object type.

(Branimir told me a trick, where a "presentation" column was stored in a central place, such as acs_objects. This column contains whatever the object type decides to store there, to use when presenting objects of this type, without hitting the type-specific tables.)

Thus, we need to talk about how this fits with the road-map.

For 5.1, if it solves an important immediate problem, I'd be okay with this solution,

But for 5.2, I'll want to revisit and think about a proper solution that will work for a data-centric approach. We should discuss that as soon as we've pushed 5.1 out.

/Lars

Collapse
Posted by Dave Bauer on
I totally agree with Lars on this. He said what I wanted to say much more clearly.
Yes, we did use a column like "presentation_data" or something the like in sharenet where every package could store presentation information unique for that package. We then had callback procs (nowadays you would use a service contract) to be called with the data from that special column. These procs then returned the appropriate html blob to present the particular object.

I didn't present this as a TIP yet since i thought agreeing on something as basic as pretty_name, package_id and id->url resolution would be easier and more needed at first.

Approve
Collapse
Posted by Roberto Mello on
Approve
Changed to Approved.  Timo, could you document the "core changes and the changes to news, forums, blogger, file-storage, mailing-lists, categories" so that "someone else should do it on the other packages?"  Maybe we can finish this up at Berlin.
I didn't participate in this discussion because a) I was getting ready to go to Guatemala and b) I'd expressed my support for this many times in the past.

When will this be added to head?  I just made the contrib/portals package subsite-aware and would love to use the acs_object package_id rather than the one I just added to the portals table!

Also we need a default package for those objects for which no package_id is specified ... should we make the acs-kernel package a "magic object" to be used as default?  I'm thinking in terms of upgrade scripts for packages that currently aren't "subsite-aware" and don't currently carry a package_id in its type-specific tables.

Interesting to see the "data-centric" argument and the notion of a proper object hierarchy using a parent_id separate from context_id arise again.  That's something I've wanted (pulling it from content-item ...) for a long time.  Actually I'd like to pull context_id from the object altogether and have packages manipulate the context relationships explicitly since only the denormalized map is used (the current system requires making a new copy of the object tuple in PG as well as the map tuples every time you change the context relationships which is relatively inefficient).

But all such changes require considerable effort be put into upgrade scripts so moving forward incrementally is a good idea.

Should we put an "on delete cascade" referential action on the package_id column in acs_objects?  This would be a step in the direction of simplifying the dropping of a package instance (and this further leads to an argument that having package_id around might be a good idea even if we move towards a more data-centric POV...)

In any case you also have to deal with the circular reference acs_object --> package --> acs_object.
I just ran into a variant this on the Oracle HEAD. In oracle, a foreign key constraint from acs_objects.package_id is getting added in apm-create.sql, and this was breaking the install. It looks like the PG version simply doesn't define the constraint. It looks like we can keep the constraint by rearranging the code in apm_package.new to first insert into apm_packages, then update acs_objects.package_id. Not sure who is actively looking at this.
One of the main reasoning for this TIP is that it's needed (as i explicitly mentionend in the TIP) for search and categories since these are packages dealing with objects of unknown origin and there has to be a way of showing a list of arbitrary objects to the user (thus the need for an object-name at a central place) together with links to the objects (thus the need for package_id and a url derived from the object_id).
1. An inexpensive way to get a URL for an object. The /o/ trick is the best solution to this I've seen so far.
It's late to be kibbitzing on this, but while I agree about the need to cheaply get an object's name (and, for that matter, the object creator's name), there is a simple trick for getting the object's URL. Namely, you defer those calculations and put them on another page. So every object links to "/one-object?object_id=123456", and then you have one-object.tcl do the expensive lookups which would not be feasible on a page that returns lots of objects of different types. I think this is what Lars is talking about when he refers to the "/o/ trick".
For the record, I changed the fk constraint on package_id
to on delete set null since (I thought having on delete cascade was too drastic, and not cascading breaks almost all
delete methods).