Forum OpenACS Improvement Proposals (TIPs): TIP #132 Plans to address storage_type problem in CR

We recently ran into an interesting issue while debugging the display of portraits. It turned out that our instance was originally set up to store files in the DB and even though all newer versions of the portrait upload code explicitly use the file system, the storage_type column in cr_items was not being changed appropriately. The result was that the CR had the content on the file system but was trying to serve it from the DB - the items had become corrupted.

Looking into the issue with Don a little more closely, we decided that storage_type probably shouldn't be in cr_items after all but rather with the revision itself.

We decided to propose a short term and longer term solution. In the short term, the APIs (tcl and plsql) should honor the storage_type option, i.e., set storage_type in cr_items to match the storage_type of the new revision (if no live_revision is set or if is_live is passed in and is 't'). That will corrupt earlier revisions but at least the current revision will be OK. Any place where the live_revision is changed should also probably do a sanity check (thought that would not be pretty - we'd need to look at the actual content to determine storage type since we really cannot make any assumptions).

Longer term, the plan would be to move the column into cr_revisions and provide the required upgrade scripts.

Collapse
Posted by Dave Bauer on
That is, the new REVISIONS should honor whatever the original cr_items.storage_type was set to.

This is done explicitly in file-storage. It makes much more sense to NOT corrupt an item, but handle each item and store all revisions for that item in the same way.

Fixing content::revision::new to use the items storage type always is the best thing to do.

This should be MUCH simpler than checking when a revision is set live.

What you are saying is that you don't support the idea of being able to change the storage_type of an item once it's been created.

I'm not sure I agree but I don't have empirical data that this is an incredibly useful feature to back me up. Does anyone else have strong feelings about this?

Collapse
Posted by Dave Bauer on
Why is it useful?

You state it causes serious problems and it corrupts the item. It seems MUCH simpler to use the already working code to just keep all revisions of an item the same storage type.

If you want to move storage type to cr_revisions, then I fully support revisions with different storage types, but not until then.

OK, let me backtrack - I guess I wasn't clear because your answer isn't. The feature I was asking for feedback about is the ability to change storage_type for existing items, not any specific piece of code. Are you saying you think that's useful? That appears to be the intent of the original code but I don't have any use cases to support making it work that way. If anyone else has an opinion...

Anyhow, what's there now is broken, or at least capable of breaking very easily and does not behave as it should. I proposed having the revisions proc truly honor that option and change it accordingly in cr_items at the expense of old revisions. Yes, it could be a little messy. Another route would be to simply remove the -storage_type option temporarily but if everyone thinks this ability is desirable, that seems a little silly - perhaps we should just wait and do it right by moving the column to cr_revisions.

I am in this discussion on Dave's side. The only interesting use-case use-case i see currently is that changing the storage type could provide an easy means for a developer to revise in a release an earlier decision about the storage type. This ability could save the developer from the need to write a change script. I don't see the need e.g. for an end-user to say "please store this revision as file or text".

From the data-model i would also conclude that the original intention of the design was not to change the type during revisions, so the current *interface* has a bug. It seems quite natural for me to save uploaded files with the storage type file and input strings of a form as strings (the lob type is a candidate for deletion in my opinion). Providing an interface where one adds once a file and once a string is unusual. If one needs such a feature, it could be easily provided with a custom type, using storage type "file" and a custom attribute "content" of type text.

So, why not keep things simple and remove essentially the storage type option from content::revision::new, and mark content::revision::update_content as private for the time being. This change is essentially a bug fix.

If there is indeed in the future a strong reason for making the cr even more powerful by allowing to switching between storage types, this should be tipped again.

-gustaf neumann

Collapse
Posted by Dave Bauer on
Michael my objection to this change is explained right here
" It turned out that our instance was originally set up to store files in the DB and even though all newer versions of the portrait upload code explicitly use the file system, the storage_type column in cr_items was not being changed appropriately. "

There is a bug in the portrait code. It should respect the storage type of the cr_item when adding a new revision. If it doesn't that is where the problem is. So hacking the content repository to change the storage type when you change the live revision does not even solve your problem. It only masks the problem. Code that allows viewing of non-live revisions will still break. The only sane solution right now is to make sure all revisions use the same storage type.

To fix your problem with portraits, I suggest deleting all the old revisions, and fixing the storage type to match the latest revision. Since portraits don't support accesing the old revisions this should not affect the user at all.

Although logically, I understand why storage_type belongs with the cr_revision, I can't see any practical reason to store different revisions of one cr_item as different storage types. I think the real fix is to make sure, once the item is created, all subsequent revisions have the same storage type.

Gustaf - I was not arguing necessarily for ability to change the storage_type of an existing item. I was looking for feedback about whether it's useful or whether that was even the original intent. You may be right that the original intent was not to allow the changing of storage type. In fact, the PLSQL API does NOT have that storage_type option which would back that interpretation up. If we go with that interpretation, then you are exactly right - it's a bug in the interface. It's not, however, a bug in the portrait code. It's either a bug in the interface OR a bug in the API code (and possibly the underlying data model). If application code uses the CR API as documented, the CR should behave accordingly.

As I mentioned in my previous post, I'm OK with removing the option if nobody sees an immediate use for changing the storage_type on existing items. That's certainly the easiest thing to do.

Collapse
Posted by Don Baccus on
It's either a bug in the interface OR a bug in the API code (and possibly the underlying data model). If application code uses the CR API as documented, the CR should behave accordingly.

As I mentioned in my previous post, I'm OK with removing the option if nobody sees an immediate use for changing the storage_type on existing items. That's certainly the easiest thing to do.

I vote for this option. Michael's entirely correct, the API should match the datamodel. If storage_type isn't in cr_revisions it shouldn't be offered as a param in cr_revision::new.

Now - how much code will taking it out of the API break?

A history lesson - storage type was an OpenACS addition to ACS 4. My idea, actually, to allow the choice, but mostly implemented by Dan Wickstrom.

I think this datamodel/API mismatch arose because at the time we were just learning the core code (very different than ACS 1-3) while breaking pieces out from tcl files into XQL files, porting queries, inventing the tree structure code for PG to overcome lack of connect by, etc etc.

In other words, Dan and/or I just made a boo-boo because this feature, while important, from a coding point of view was really quite minor.

So it's a bug from the beginning of the OpenACS porting effort ...

Collapse
Posted by Dave Bauer on
I think passing in storage type is basically a shortcut from content::item::new, since we already know the storage type, it seems unecessary to query the database to get it.

Obviously its more consistent to always get the storage type from the cr_item from the database.

Collapse
Posted by Dave Bauer on
Ok, Michael do you want to review this and see if there is a consensus and summarize the proposal after this discussion?