Forum OpenACS Development: OACS content repository services

Collapse
Posted by Lee Denison on

The content repository seems to be a collection of services brought together in a convenient form for a developer who wants their content versioned, categorised, workflow controlled, searchable, etc.

What is the definitive list of services that the content repository intends to provide for content?

I have:

  • Versioning
  • Categorisation
  • Keyword Association
  • Workflow
  • Search
  • Content Organisation (Folders)
  • Content Dispatching

Does this represent only one permutation of services a developer might want for the CR? Is the OACS CR intended to be _all_ services a developer might want in a CR?

Has anyone thought about abstracting these services to allow a developer to combine them in whatever fashion they need? Or would the restrictions of modularity created by that abstraction be too much of a performance hit?

Collapse
Posted by Lee Denison on
Whilst I'm thinking about it.

Why are folders and categories treated separately?  Is it for performance reasons when dispatching a request for an item?  Is it because folders are the canonical taxonomy when dispatching a request for an item?

Is there a distinction between a taxonomy of categories in which a content item is placed, and a set of keywords assigned to a content item?

I would say that an item is likely to exist at only one point in a category tree whereas it is likely to have many keywords associated with it.  I would also say that if categories a more likely to be organised in a tree so that an item automatically exists in the parent categories of it's direct category association - whereas keywords are more likely to be in a flat structure with thesaurus type relations between them.  Do these distinctions matter?

Collapse
Posted by Dave Bauer on
Lee,

I can answer the keyword questions:

Categories and Keywords are the same objects. It is just a difference in the CMS interface and documentation. A cr_keyword with children magically becomes a "Keyword Category". That is all. otherwise they are exactly the same. I am planning on a user interface to the keyword services seperate from the CMS package and I think having two different terms for a top-level keyword with children and one without is confusing.

So to answer your question, keywords may be arranged in a tree, but it is not necessary.

Collapse
Posted by Lee Denison on

Dave,

Speaking of a new interface to categorisation, the OACS seems to be powerful in its ability to categorise content, but not so powerful when it comes to using that structured information. Some uses for categorisation that I can think of are:

  • Restricting searches by category.
  • Mapping a category tree to part of the URL space; this would need to be supported by category navigation widgets or something.
  • Generating a site index by category; if multiple category trees exist then content could be index by primary category then secondary categories within that.
  • etc...

Indeed it might be nice to be able to associate alternate names for a category with it, so that if you were looking to show Yahoo! style search results with matching categories displayed, you could also match alternate names for a category.

Either way I was thinking it might be good to be able, in a new interface, to associate some meta-information with a category tree indicating the intended uses of that category tree.

Thoughts?

Collapse
Posted by Dave Bauer on
Allowing seperate category trees for different URLs would be very useful for a site with multiple unrelated subsites. I don't think it is possible to support this with the current structure it is very simple. a table of keywords and a keyword map table that maps each keyword/content-item pair.

Also allowing package-based category trees would be good. Bboard and other packages now implement their own keyword lists and maps. The keyword system needs to be standardized across the system if it is to be useful.

Category based searching also should be on the list, but we need to get the packages using the keywords before that will be necessary.

There is definitely alot of work that can be done to enhance the keyword system. I had just been thinking it might be advantageous (or maybe not) to allow keyword assignment to acs-objects instead of just content-items. That depends on whether your opinion is that all content for the entire site goes into the content repository.

Collapse
Posted by Lee Denison on
IMHO associating keywords with acs_objects instead of cr_items is absolutely the right answer.  I agree, the fact several packages have their own keyword/category implementations is a good indicator that the service needs to be generalised - so that, for example, my breakdown of bboard posts by "sport" is equivalent to my breakdown of news by "sport".

The question of whether all content should be stored in the content repository goes back to my first post.

If you regard the content repository as a convenient group of services, then it is no longer a question of whether content is in the content repository - it is a question of which services do you want to provide to this group of content.

Currently if I want my news to go through workflow, be searchable and have keywords assigned but I don't want the overhead versioning, I have no choices.  If I could pull any combination of the distinct services together to form my own content repository then I would have more flexibility.

I know that this flexibility would come with a cost in terms of system complexity.  I believe, for example that the abstraction of services is a major factor in the performance and learning curve problems present in ACS JAVA.  I still believe however that categorisation is a service that should be system wide - and that its functionality should be made available to end users more often ;)

Collapse
Posted by Don Baccus on
Yeah, you're right about categorization needing to be generalized, that's certainly on our wanna-do list.  The symptoms you see in existing code (packages rolling their own) are often an indicator that the aD folks were working under time pressure to complete a given package rather than sit back and wait for a better solution.

I don't think the item-revision structure of the content repository is so burdensome as to cause a text content-oriented package to avoid using it.  Strictly speaking, for instance, the bboard package doesn't need revisioning (at least as currently implemented).

It does add a little overhead to the API and of course two items that need to be joined before content can be presented.  But in return we get a simpler structure and a greater degree of commonality (well, the latter's more theory than a current truth given the fact that various packages abuse the CR to some extent or another - more stuff to clean up in the future).

Collapse
Posted by Stephen . on
I don't think workflow is dependent on the content repository. Keywords, categories and folders etc. are, but they needn't be (shouldn't be?). Only versioning is integral to the structure of the CR.
Collapse
Posted by Lee Denison on

Don: I get the impression from your response that the answer to my original question, "Does the OACS content repository represent one permutation of services that a developer might want, or _all_ services a developer would want in a general repository of site content?", is the latter. In other words, whenever I want a couple of the services for my content, I get all of them and simply don't use the rest.

From this point of view, all site content should be in the OACS CR and should simply not use the services that are not appropriate for that content. If the OACS CR meets it's design goals then this shouldn't be a significant overhead.

Stephen: You seem to imply the alternative. If the OACS CR doesn't depend on workflow, and could be made independent of categories, folders, etc then the OACS CR reduces to a versioning service. If all of these services have become distinct, each module would then choose the appropriate combination of services for its content.

From this point of view, the OACS CR as it currently stands is just one way of combining the various services that are relevant to Karl's CMS and various other packages. Consequently, content for other packages (bboards, general comments, etc) should not be stored in the current OACS CR.

Which direction is OACS heading at the moment?

Collapse
Posted by Stephen . on
There is no difference between choosing not to use services 'included' with the CR and choosing to use services available to any acs object. The only reason some services are coupled to the CR at all is that the service references cr_items.item_id rather than acs_objects.object_id.

Maybe the CR doesn't need to exist. One thing missing from acs_objects is a name column. The first specialization package objects make is to add one (of different sizes, and it's a big pain with multiple inheritence keeping them synched). Why not add a clob/text column too? I'm not so sure about the remaining guts from cr_items/cr_versions, but acs_objects could stand to be fatter.

Collapse
Posted by Don Baccus on
The CR doesn't depend on workflow, but one idea some of us have been batting around is to provide some common workflows (and supporting HTML widgets and Tcl API) for content stored there.  Maybe as a separate package, maybe integrated, I haven't decided yet (workflow and CR are both core packages so they're always available).

Getting rid of the CR and putting clobs/text in acs_objects?  I don't particularly care for that idea, to be honest.

Collapse
Posted by Lee Denison on
Let me phrase my question another way.  If all of the services I mention above were made to apply equally to acs_objects as cr_items, would there be any point in trying to ensure that all site content goes into the content repository?
Collapse
Posted by Don Baccus on
Well, no, but at that point in essence content item would <b>be</b> the base object in the system.  Categorization and the like should be applicable to any object in the system, but I see no reason to have the base object in the system implement versioning, for instance.
Collapse
Posted by Lee Denison on
I could be wrong, but my feeling is that the reason versioning keeps getting singled out in this thread is because it is the only service in my list above that is intrusive in its implementation.  In other words it's the only service which affects the way you design the datamodel for your content.

As Stephen points out, for all the other services, choosing not to use a service geared for the CR has no particular overhead.  They could also be equally well applied to acs_objects as cr_items.

_If_ (big if) versioning was implemented as a transaction log external to the contents own datamodel, would there still be a problem with allowing acs_objects to be versioned?

If not, I think OACS would benefit from having its base object type have direct access to those services needed by site content.  For example if the bboards also arranged their posts in folders, administration staff who might author content using a folder based interface could also moderate from the same inteface.  If that were suggested with the current CR I think the main objection be that the posts would then have to have the versioning overhead.

Collapse
Posted by Don Baccus on
There's already a journaling service in the kernel ... The bboard already keeps its content in the CR and indeed if you visit the CMS you can see the contents of the bboard and other CR clients. The bboard doesn't need versioning so one could argue that use of the CR is overkill.

Is the datamodel design we've inherited perfect? Nope.

Does it suck? Nope

Is it going to get totally restructured in some serious manner? Not likely:

  • It would be a huge undertaking
  • We have barely scraped together the resources to port what exists and to add relatively minor extensions
  • There are serious shortcomings, particularly in the realm of admin and CMS UI, that need to be tackled
  • We're really lacking in packages still.
  • There are a lot of folks interested in building sites soon, not far off in the future
  • etc etc