Forum OpenACS Development: problems with unique name constraint in cr_items -- file-storage

While working through steps to migrate 3.2.5 sites to 5.x, I've encountered a problem caused by the unique constraint on the name column in cr_items. This problem actually is a bug opened long ago.

File-storage, now based on the CR, stores the file name of an uploaded file in the name column of cr_items. This column has a uniqueness constraint on it (and its parent) that presumably has some rationale, but here's how it causes problems. Consider this use case:

Two different users launch Word to write an autobiographical narrative. Each saves their text using the default name Word offers: "Document1.doc". The first user uploads her file into a file-storage instance on the site using the title "Amy's Story". So far so good. Now the second user uploads his file into file-storage using the title "The Life of John". Even though this second document has nothing in common with the first document and in fact has a different title, because it has the same file name, the CR automatically assumes that this is a new version of the first file. So instead of the John's document showing up as a separate file, it is mysteriously uploaded as a new version of the Amy's file -- and her file "disappears" as an old version that no longer shows in the file listings.

This problem is one that cannot be prevented server-side; users will name their uploaded files however they want, and it is virtually certain that sooner or later they will eventually choose identical names for files that are completely unrelated -- yet that they will want to upload into the same file storage folder.

The patch I suggested back when provides a fix by changing the filename of the second uploaded file if it is done during an "add file" operation. This appears to me to be the proper behavior, but at least DaveB didn't agree back then; maybe others didn't/don't either. Is anyone else running into this? If not, how do you avoid it?

So, what do people think should happen here? I'll commit this fix if people agree that file-storage shouldn't step on users' uploads the way it does now.

Stan,

The filename is the URL of the file, so you can only have one in every folder.

What should REALLY happen is, if a file with the same name exists, the system should tell the user, and suggest a unique name, but allow the user to change it.

The confusion comes from people trying to use file storage in two ways, 1) file sharing with extended attributes like "title" 2) replacement for filesystem.

Since the filesystem metaphor conflicts with the title metaphor. If you use file storage like a filesystem, of course creating a new copy of a file with the same name will overwrite the old one. A good application might warn you if this is happening.

So someone (not me) needs to decide how file storage will work. Perhaps adding a parmeter to turn this behavior on or off.

Ah yes; the filename is the URL. Seems like the desire to avoid URLs like "/storage/one-file?file_id=1234" creates some complications.

In any case, there already is a "BehaveLikeFilesystemP" parameter in the package, but this parameter right now only sets whether in folder-chunk a file's title links to /download/ or to /view/.

I suppose that some behaviors could be added to this parameter on the "file upload" side:

BehaveLikeFilesystemP = 1:
A new uploaded file that has a name conflict with an existing file would generate a new version of the existing file. This doesn't actually overwrite the old one (as you point out it would in a real filesystem) but it effectively does because it becomes the latest/active version of the file.

BehaveLikeFilesystemP = 0:
A new uploaded file that has a name conflict with an existing file would generate a new file, just as if there was no name conflict. Behind the scenes, it would modify the filename by appending an integer (ie if there already is a filename "myduplicate.doc" then the second uploaded filename would be "myduplicate-1.doc".

These upload and download behaviors could be controlled by their own parameters, though, too. That might be clearer and more flexible.

Here is a patch that checks the BehaveLikeFilesystemP param for a file-storage instance to handle this situation:

  • If BehaveLikeFilesystemP == 1, then file-add.tcl "overwrites" the existing file by adding the new upload as a new version to it and making that new version the live one -- ie just as the current code works.
  • If BehaveLikeFilesystemP == 0, then file-add.tcl alters the filename by appending the new file_id (well, object_id) to provide a guaranteed-unique name, then adding the new upload as a new file in the folder.
  • The patch also provides some I18n instructions explaining what's going on.

This is a pretty simple fix, so unless I hear screams from someone soon, I'll go ahead and commit this to oacs-5-2.