Forum OpenACS Development: Idea: New Content Paradigm

Collapse
Posted by Simon Carstensen on
Disclaimers: I have little knowledge of the current CMS. And "paradigm" is perhaps a bit over the top :)

I was talking about Content Management in general terms with a  friend the other day, which started off a series of thoughts. So here are my thoughts, although I'm sure the current CMS already incorporates some of them.

There are of course many ways of categorizing and presenting content depending on what framework it is presented within (i.e. article, Q&A, news). Content might be categorized in sections, by subject, or by category.

An alternative, and perhaps more flexible and useful, way of presenting content might be using keywords. The text is just text and the structure is created through the use of keywords connected to the text. That's pretty standard.

There are different keywords, though. There are keywords that describe the content type (news, comment, post, etc.) and there are keywords that constitute subject (OpenACS, photography, beer).

Connecting these two types of keywords, we get different "views". To create a page that displays news, ask for content having the "news" keyword attached to it, ordering by date. Want "news about beer", ask for conent having the "news" and "beer" keywords attached to it. This way the user (i.e. the CMS admin) creates the structure.

Instead of having huge chunks of text (an article for example), all content is divided into smaller items of text or subelements with their own a set of preferences/characteristics (header, text, bgcolor - could be many things). These may be connected into an article, but may as well be structured into a Q&A forum, depending on the view. Or even an RSS feed. Content would be completely seperated from presentation this way. This structure of dividing content into minor divisions makes it, not only easy to create different views, but also easy to edit articles, move around and delete different sections, etc.

An article might exist of 5 sections, which would in the new system constitute 5 elements, each with their own header, text and other characteristics.

Those were the initial thoughts. I was thinking along the lines of deprecating lars-blogger, creating this CMS, which would make it easy to present content as both a regular article and as a weblog. In other words, the initial thought was that two packages resembling each other so much, shouldn't co-exist.

Then I went a little further and thought perhaps this could be extended to OpenACS at large. Content would live in the Content Repository and packages like News and Lars-blogger would be predefined views. You wouldn't have to setup the news package to handle news and lars-blogger to write an entry - content would be at one place.

The hard part would be to create an API flexible enough to support all these views (reminds me of the unix philosophy of small sharp tools, each intended to do one thing well) without having to build any procs specifically for a package. That way "content packages" could live only as "views", supported by applications such as general-comments, news-aggregator to further spice things up. It would be a sort of "content-builder" :)

If this is even possible, I am not sure. I do know that it is a little far-fetched and I certainly don't expect the OpenACS community folks to jump out of their chairs :) But perhaps it might stir some ideas? If nothing else, for the CMS.

/Simon

Collapse
Posted by Michael Feldstein on
What you're describing sounds vaguely like a piece I read recently on XFML:

http://diveintomark.org/archives/2002/12/03/this_is_xfml.html

Collapse
Posted by Tom Jackson on

I have struggled over whether to use the CMS because the items I consider content seem to be spread out among the different attributes of a table row. Why are the article paragraphs more important than the title? In my case, I don't have huge chunks of data, so choosing which attribute should be the content seems kind of arbitrary.

Another point is that the content usually associated with the content repository is deeply structured. Having an news article stuffed into a single attribute is more than arbitrary, it is incorrect.

It is easy to argue that this is the best we can do, that the 'content' is 'html'. Wrong! Content is not html. In five years most content will be contained in an xml document, or something else.

It seems inevitable that, in the future, the fine details of a real world document will be represented in a structured markup language, not as attributes in a database table. How is OpenACS going to address this issue?

If the you have pure data, a markup language is way too heavy. For structured data, you could try the Rmadilo language, which should parse faster than csv and represent the same structured non-markup data as xml.

Collapse
Posted by Dave Bauer on
Simon,

The content repository is already designed to handle content in this way. We need improved APIs for developers and after that, improved UI applications for content creators.

That is, a content item can have child items of differing types. You can identify which types of children an item may have. So you can build up a page from various content items.

You don't want to store bgcolor or other presentation information in the content items themselves. You can specify templates for folders, and content items, and possibly to content types (if not we should fix that), and can have different templates depending on a context of how the item is being viewed.

Most of this is not apparent to the average developer, and that is the problem we need to focus on, making the existing functionaly easy to tap into.

Tom, a title is an attribute of the content, not the content itself, I am not sure what problem you have that the CR can't solve. You can create new content types and store additional attributes in a type specific table.

Collapse
Posted by Ola Hansson on
It seems inevitable that, in the future, the fine details of a real world document will be represented in a structured markup language, not as attributes in a database table. How is OpenACS going to address this issue?
The same way we're addressing it today I'd guess.

I.e., for the time being "OpenACS" represents most of its documents, a.k.a. pages, in HTML. What happens at least most of the time is that some of the resulting markup is being defined in the ADP template by the designer and that some of it were defined by the content contributor. That "user provided" content might include markup too - I've got a pair of <blockquote> tags in here for instance 😉 ...

If this is strange or bad in any way, I haven't noticed it yet.

If in the future XML or some other structured markup language becomes the norm, we would have to a) change the markup in the templates from HTML to ?ML, and b) teach the users to use the new markup.

The reason the body of the forum posting or the description of the bug is divided into further parts is that only the author of that piece of the content would know what it should look like.

Oh well, I probably missed your point anyway...

Or, are you saying that we should get rid of all columns but one in a "articles" table (say) and store all the article attributes together with the markup in that column and then duplicate the markup for all rows? Hardly. 😊

Can you elaborate a bit? This is interesting.

Collapse
Posted by Tom Jackson on
a title is an attribute of the content, not the content itself

Yes, and I think the CR works great when you have a dominant chunk of data you want to call content. If you think in terms of data, each attribute, be it title or content, are equivalent. CR has chosen to make one attribute more important than the others. If you application has multiple equally weighted attributes, or if your content has a deeper structure you want to expose, the CR isn't going to work too well.

If the content were stored as xml or some other structured format, you could still create indexes, but you could also search for more specific information. I'm just pointing out that when you get outside the one big chunk of content paradigm, you start having to work to make you application use the CR.

Collapse
Posted by Tom Jackson on
Can you elaborate a bit? This is interesting.

Sorry, Ola, I was posting at the same time as you.

Don't get me wrong, I am warming up to the CR. I just don't have one big chunk of content per item, so I'm wondering if it is going to work. Also I am wondering what to do about video and images. Although these are content, the hardly need editing, it is more likely that the attributes would be edited than the main content item.

Anyway, with structured data, would it be possible to create a web page that allowed you to edit the individual items instead of entering the markup directly? If there were a schema for the item, maybe you could have controls for adding chunks of information.

If you know how to write the structured markup, you could still add it in-line and on the next update the form would change to allow further editing. I don't know how useful that would be, but a single textarea box to edit a huge document is almost useless.

Collapse
Posted by Jeff Davis on
Tom, in cms you can set the default content type to "no_content" which then does not provide the content upload/entry fields (although it does still ask for mime type but that is just an easily fixed bug). It then acts more like what you are talking about.

For things like images where you don't usually want to version the large binary object but probably do want versioning on the other attribute data you create two content types one of image with minimal data (in the case of the image type provided in CR, width and height) and a second content type with all the other attributes you want to maintain and then just relate the two objects.

CMS will actually create the columns and views for the content type for you given the attribute metadata, has code to generate the creation and editing forms, handles doing inserts and updates of content, has provisions for storing an XML version of each revision, lets you associate a template with a content type, lets you relate content items fairly easily, and will write rendered content to the filesystem. It does not all work flawlessly but an awful lot of it is quite close to working and would work correctly with a little more attention. There really isn't anything built into the CR or CMS to prevent you from having pure "content" and avoiding big hunks of html as the content.

Well, the only real roadblock is that the UI of CMS is beyond trainwrecklike. But the backing code really does a lot of smart things for managing structured data. Where it does less well is for composite objects where you would want to edit a parent and n children (e.g. a wp slide where you would want to edit the slide and the bullet points and have the slide and each bullet point be a cr_item) but I am doing some work that will make that possible as well.

Anyway, I would strongly recommend anyone contemplating creating yet another CMS take a look at what the current cms can do and consider how much easier it would be to fix the UI for the existing system than to create a new one from scratch.

Collapse
Posted by Tom Jackson on

Jeff, all good points. As I said I am looking into using the CR, and I was thinking of breaking down my cr_items as much as possible. I like the idea of being able to relate any item to any other item, plus the idea of parent/child relationships is useful. I've had not much interest in the automatic insert/update/delete stuff since I use my own code for that, but I should look at what it is doing more closely.

I know it has to be good stuff, but my pea sized brain always seems to have trouble grasping other people's code.

Collapse
Posted by Jeff Davis on
Jim, the big advantage of the CMS insertable view trick is that you don't hit any limits on number of parameters as you would with __new things in postgres. It supports defaults in the metadata (and in the table definition). The only disadvantage would be if you really do need to do anything fancy in the new method (although for CR items that seems mostly not to be the case). Of course if you are talking about small bitesized things with few attributes then that is not an issue. For photos for example if you create attributes for the useful exif data fields you already have more that 16 fields. Given how many things are gobbled up for just the basic metadata this seems to be a serious problem for most types.

Talli, there is a table cr_revisions_attributes which is supposed to hold an XML representation for a given content revision and there are things in oracle to generate the XML (although I have not looked to see if this is present in postgres). The intent is to use it as a secondary representation of the given revision (intended originally for indexing I think but there is no reason you couldn't use it to feed an XSL styling step or embed in a larger xml document to export a content collection). It's not really related to RSS at all (other than both things are XML). There is not any provision for making the primary datastore XML rather than table based, although you can of course store XML in the content field, there just is no pleasant way to query it other than with full text search really or other RDBMS things to it (like sorting or updating).

Collapse
Posted by Talli Somekh on
Jeff, can you speak more about the ability of the CMS to generate XML from revisions? Specifically, vis a vis generating RSS from content generated. Is this something peculiar to the train-wrecked CMS or something inherit to the CR?

talli

Collapse
Posted by Jun Yamog on
Hi,

I have been pretty away from the community, been very busy.  So apologies if I blurb some that is unrelated.

Regarding the versioning of CR.  The recommended way to extend it through cr_revisions and not under cr_items.  This would mean as pointed out by Tom.  On big images you will have 2 big copies of images even you just edit the title.  Still if you use the file system to store binay files there is nothing stopping you pointing 2 revision to the same file.  But then of course that will be breaking the rules of the data model design.  But its still possible.  Also its still possible to inherit from cr_items, much like cr_folders did it.  But then the revision procs may not run properly.  As you may need to make some of your own revision procs.  Although a good number is still reusable, surely the insert on a view will not work anymore.

I think the current design of CR is pretty good.  Although CCM makes use of another type of versioning.  It groups attributes to be versioned.  On some cases it simpler, on other cases not.  So for example the title of an image actually points to another table that holds different versions of the title.  Still OACS and CCM CR/CMS data model as no clear advantage over the other.  Look at this post, pretty interesting.  Also shows how ACS 3.x used to do versioning (I think 3.x e-commerce used that type of versioning)

https://listman.redhat.com/pipermail/redhat-ccm-list/2003-February/000791.html

Regarding storing things into XML.  I think it would still be best to create a content type with additional columns store the information rather than storing them as content.  Although not as sexy as it sounds, its relatively very easy to make XML for a bunch of columns.  So yes Talli you can create RSS or pretty much anything from CR.  Also it will be harder to query or search on a xml content.  Using a bunch of tables also gives you the flexibility of presenting content items that you have not designed before, unlike xml which is more rigid.

Regarding the creation of a new CMS package.  Some have seen the alpha state of it.  Although its been 2 months now since anything moved as its not the one that is putting food on the table.  Although I agree with Jeff on some degree that fixing the current UI of the CMS maybe easier.  But on my case Robert Locke's advice to me still holds "adding is easier than subtracting".  Also the fact that in most cases CMS may function alike but almost each clients wants a real different UI.  The UI that works for their business/style.  So if I am able to put the changing stuff in the which is mostly UI on another package.  And the less changing stuff on another package.  The theory is that when another client wants a different UI, atleast I know which parts that will not change.  There is a clear distinction between UI part, service part and serving content part.

Hope this is inline with the discussion.

Collapse
Posted by Jun Yamog on
Hi Jeff,

I am assuming I am Jim, since Jun is the closest to Jim :)

Yes indeed I agree that inserting to a view is a lot better than calling __new.  This is why I use this method in the CMS I am trying to make.  I submitted also a small fix to DanW about this.  Another plus is it makes query to be db independent.  Well atleast if the db supports rules in views, plus DanW good hack.  I also use the view now just for inserting, I also use cr_foo_typex to get some stuff.  Sometimes its more convinient than joining acs_objects, cr_items, cr_revisions and foo_table.  Not sure if its fast or not.

So if anyone is planning to inherit from cr_items and will need revisioning (some weird decision) this is one thing that I am sure will not work anymore.  I did not follow this path, I am too lazy to create my own.  Also my non primary objective is to be CMS compatible.  That means BCMS will be able to manipulate the CR even if CMS is handling it, and vice versa.  Again its not a primary objective, but a nice to have.

Talli,

Jeff says it better than me regarding storing XML.  Although it possible to mix them both.  This is what I am employing right now, although its not OACS.  I believe its applicable to OACS too.  I have a table with columns that are important for the content type.  But the main body/content still have a lot of XML.  This XML is not important to relate or be a living entitity.  (Well at least not yet, it will be some real work putting this into table columns).  XML attributes like footnotes, chapters, etc.  They haven't been warranted to be living in a separate column.  But the end result is still XML, constructed from the table columns and body/content that is XML too.

Collapse
Posted by Jeff Davis on
Actually Jim = Tom (I was writing a note to Jim Lynch at the same time which was why I got it wrong).

On the view approach I was talking about the relative merits of something like Tom's query writer (or package_instantiate_object which is more or less the same thing) and a view based approach. I think qw is useful but given how quickly it breaks down on postgres I think it can't really be the basis of a generic content attribute editing system.

Collapse
Posted by Talli Somekh on
Yeah, I would never suggest storing content as XML. That seems like a PITA given the OACS' tried and true reliance on an RDB. I was wondering more about the facilities for creating XML representations of data stored in the CR, or wherever in the OACS.

It sounds like the answer is, "They are good." Which is very cool.

talli

Collapse
Posted by Tom Jackson on

Jeff,

What is the 'view trick' y'all are talking 'bout? If it eliminates the 16 param limit on __new that would be great. Since my code would be autogenerated, I certainly don't care about much more than the ability to automatically create the view and know how to use it.

(BTW although there is a 16 param limit per function, you can have as many __new functions as you want. As long as the call to qw::new includes a list of params possible in one of these functions it will work. )

I don't care to use any more pl than is necessary, and as I probably already mentioned for updates I will eliminate most pl, except something like 'where my_object__object_p('1234') = true' type of thing to ensure editing objects through the correct method.

If anyone else has suggestions on what query-writer should do, or how it should do it, please put in your thoughts now!

Collapse
Posted by Jun Yamog on
Hi Tom,

You can look at acs-content-repository/sql/postgresql/content-type.sql

content_type__trigger_insert_statement and a few functions below it.  What it does is when you insert to a view like cr_fooi.  It fires up the trigger and calls the __new etc.
Its similar to calling the __new functions but it does so during a insert event on the view.

I guess what is Jeff is suggesting if we can make use of this on other object types as well.  I think Don is looking at this too, DanW also indicates in the code that further testing needs to be done.

Regarding your query writer.  What does it do?  Does it write for you the sql on a file.  So you have a template to follow.  Or it uses its own thing, wherein it writes the query on the fly?  Sorry that I have to ask this questions?  (I am coding in CCM so its hard for me to follow on OACS, also my comparison of both platform, so sorry if I compare it too much).

Some points I would like voice out of QW.  I hope they are valid.

- hopefully QW will enable support for custom queries when needed.  Automation is good but hand tweaking is needed sometimes.

- but if hand editing is needed, how can QW follow the hand edits?

- not all cases we need to abstract the query.  Or maybe the point is.  Instead of abstracting in the sql layer like content_item__new.  Why not just tcl proc that calls.  content::item::upload.  Maybe its not the query writer scope, don't know.

- Easness of using the db with a complex data model is double edged.  On one hand good developers can work faster,  on the other hand poor developers can create poor code faster.  Or maybe poor developers can now code and make it really poor in terms of SQL performance.

Collapse
Posted by Tom Jackson on

Jun,

The first thing to know about query-writer is that it only does insert/update/delete queries. Select queries are a great way to shoot yourself in the foot. I don't know any better way to specify a select query than through SQL.

Second, is that the data model of query-writer allows you to specify metadata about objects. An object could be the rows of a table, or more complex objects like an acs_object which has data in multiple tables, but an object is essentially a collection of database attributes. Each object attribute can have a default value, or no default if it is required.

Using the object attributes, you can define __new, __set_attrs, __reset_attr and __delete functions. Any number of each could be defined. This is just a specification of which attributes, and which order. The type of function determines the form of the pl body. The current query-writer makes assumptions about the object's main table pk being a direct fk of acs_objects (something not true for a lot of objects). The new query-writer will correct that problem, because it will know where the fk points to.

Also I am adding points in the function body that would allow developers to add their own special code for the case you mention. If the body is really special, you can just specify a completely hand written body. I think the body will look like this in outline, for the __new function:

  • 'declare'
  • autogen: declared vars, setup aliases
  • user added declared vars
  • 'begin'
  • user begin code
  • autogen: create parent object code (user chooses which function to call)
  • user code after parent object is created
  • autogen: create new row in object table
  • final user code
  • autogen: return new object_id

    The current query-writer relies on a laborous process of adding object attributes by hand through a web interface. The new version will eliminate that and use a tcl api for specifying objects and defining functions.

    I think I have said it before, but query-writer has three apis: one allows the developer to quickly port oracle queries to postgresql and works in .xql files, the second is a tcl api which is used on any tcl page that does updated/inserts/deletes. It can be used with ad_form, etc. The third is a url based api which allows the developer to use specially named form variables to create forms which can have any number of different objects and attributes per form, or multiple of a single object type. A single form can be used to insert/update/delete in a single click, and the backend query-processing doesn't need to be programmed. I wrote a 500+ file package with very few new-2.tcl, or other x-2.tcl files. The current version is available as an apm.

    I notice the group type code creates similar pl, but fails quickly when the group type has more than a few attributes. When the new query-writer is finished, it should produce code that would also work for creating the group type code, but would write to a file and allow user code to be mixed in. Since you can have multiple __new functions, the limit on group type attributes would be removed by using this package.

    I still want to find out how the views thing is supposed to work. If they still use __new functions, they would still have to be hand written in the case of more than 16 attributes, as a matter of fact, I doubt they would work the same. Maybe they create an acs_object, then the child of that, and so on up to the current object. That would work, but might be harder to program automatically.

  • Collapse
    Posted by Jun Yamog on
    Tom,

    Wow the QW looks interesting.  I should try it one of these days.  Hopefully it helps OACS developer to be more productive.

    Collapse
    Posted by Don Baccus on
    Tom ... the CMS/CR (let's talk about the pair because, though Dan moved lots of non-CMS-specific procs to the CR, there's more to be moved) include Tcl API procs that automatically generate the INSERTs to generate new content from the magic views generated for a content type.

    Extending this for DELETE wouldn't be difficult, and UPDATE's to the CR are really insertions of new versions (if I add optional non-versionable CR items UPDATEs will need to be figured out for real)

    So far packages don't use this Tcl API because, clearly, it wasn't well publicized within aD when packages were written (or the CMS wasn't yet fully integrated) and well, we've been a bunch o' weenies here just trying to get existing stuff to work without doing much to move forward and improve things.

    Now we're starting to catch up, though, and exploring the CMS/CR is quite the treasure hunt.

    Anyway ... my major problem with your QW work is that we already have two Tcl APIs for generating objects - package_instantiate_object and the CMS stuff.  Rather than add another and have three ways to do this stuff, I'd rather see us figure out one approach to take - the view approach, something like your approach, whatever (though I think the view approach is cool myself) - and to make it work and to adopt it widely throughout the toolkit.

    So my lack of enthusiasm for a new approach is just my usual lack of enthusiasm for adding something new to work around the fact that we've got two existing half-assed approaches that are incomplete.  Let's work towards one that works and let's use it everywhere!

    Collapse
    Posted by Tom Jackson on

    I wish someone would outline the 'view trick', because reading the source is giving me a headache. It looks like some pl code creates a view which is a bunch of joins going going back to acs_objects. What I saw is very specific to the cr/cms system. Then a rule is created which runs when someone tries to insert on the view. The rule consists of running one of the cr_*__new functions and then inserting stuff into another fixed table.

    What you end up with is a bunch of pl code that is auto-generated from a set of acs_attributes.

    That is all nice and good, but it is only a small part of what query-writer does right now, and is very specific. The new version of query-writer will be more generic, will produce very little pl and it will write the pl to a file mixed with custom, hand-written code, if necessary. Query-writer also caches all the variables it needs, so no additional db overhead is created when insert/update/delete are performed.

    One reason I wrote query-writer to begin with was to help manage the form generation process and help multiple classes of users use a single form. Query-writer does this by allowing the developer control access to attribute values, and to the operations (new, set, del) on every attribute, or attribute value.

    So far, the examples I have seen in OpenACS are limited in what they do to a set of hard coded methods. They all use pl to generate more pl. This doesn't mean the ideas are not good, but they are far away from being generally useful.

    Query-writer doesn't solve all ills either. The main thrust of the package is to make new development follow a consistent set of methods for insert/update/delete on objects, and control who can do what with an object/attribute/value.

    The merchant-system package I wrote has 19 object tables. If you have an app with one or two table it doesn't really matter if you custom write everything, but with 19, it is a problem. The unicycling website I am working on will likely have at least that many object tables. The current version of qw could work for this already, but I am so lazy that I want it to do a better job. Simple development of the sql code should be a little slower than writing the datamodel (create table), but it should also bring quite a lot of extra benefits as well.

    I know there is some concern about the limit of 16 attributes per function in pg. I think this is compile time configureable, so in the unlikely chance that an object requires more than 16 attributes with no defaults, maybe a recompile of pg would be easier than forcing the creation of views and rules for every case and every one who uses OpenACS.

    The current limit for query-writer is 32 attributes. Unless the size of an integer value can be bigger than 32 bits, this would be the upper limit. (Query-writer uses bit-wise operators to choose a matching function).

    Collapse
    Posted by Jeff Davis on
    You pretty much have the view trick. Its a set of dynamically generated rules for views created to allow inserts into the view which do cascaded inserts into the parent tables. It's not an entirely general solution to invoking arbitrary plsql but on the other hand it makes building up content types from metadata quite easy. The end result is anyone creating a content revision only has to do a normal insert statement.

    Its all in acs-content-repository/sql/{postgresql,oracle}/content-type.sql and in particular content_type__create_type, content_type__create_attribute, content_type__refresh_view.

    In addition the cms system has code for building forms and performing the necessary dml for adding and editing content. It needs a little extension work since it does not support multiple forms per content type but it does work reasonably well already. So if you actually supply all the metadata thats needed you don't end up writing any dml let alone plsql.

    Some of the work I am doing now will expose that API so that packages using the CR but not CMS will actually be able to use the cms automatic form generation/dml but not the full blown CMS api.

    I looked at query-writer but I didn't really understand it completely. It would be useful if you had more sample code somewhere (although I guess since the metadata is all in the DB that's not so straightforward as it sounds right?)