Forum OpenACS Q&A: file-storage 4.x changing where files are stored

Hi,

I have a production site that uses file-storage 4.x.  I wish to change
the parameter StoreFilesInDatabaseP to 0 so it stores the content on
the filesystem.  Based on a precursor look at the code and from what I
know it should be safe to mix existing content residing in the
database and filesystem.

I am I correct?  Any opinions regarding this?  I do know the pros and
cons of each just want to make sure that switching is not a stupid idea.

It should also be safe also moving from filesystem to database... I think.

Yes, it is ok to change StoreFilesInDatabaseP. But it will change storage type only for new uploads, not the existing ones (hence the warning on the parameter).

And I'm sure you want to experiment first on a non-production site. 😉

Collapse
Posted by Jun Yamog on

Hi Jowell,

I am glad that I tested this on a test server. Apparently www/version-add-2.tcl looks up into the ad_parameter rather than the existing storage_type of the content item. So new uploads will follow the param but the retrieve will follow the storage type. So a content item will have 2 or more storage where cr_revisions are stored, making your data lost in limbo.

Do you think we should change this behavior to look up on the storage_type column rather than the current parameter? This will make file-storage safe for using both storage types. Do you think I will put it into the SDM?

What do you think about current file-storage performance problem? Thread here

Well, if you read the storage type from the APM parameter, it will be a big mess. Changing the APM parameter will render all of your current files in CR unreadable. That is unless you put a trigger on the APM table that converts the storage mode of file-storage for all previously stored files when the APM parameter changes... and I really don't even want to entertain the idea of putting triggers all over the place (much less triggers on non-file-storage tables).

With the current approach, file-storage abstracts away from the storage type (i.e., you don't have to know what it is) for all operations except file creation. It uses Don's tcl routine cr_write_content to take care of actually retriving content (Don's routine is responsible for making the code so much cleaner). I think this is safer and "internally consistent". Even if you change the APM parameter midway, all of your files will still be readable even without explicit conversion of storage_type of previously stored files.

Perhaps what you could do is to have a proc that could be run when the server isn't busy, which will convert storage_type for all files in file-storage based on the current APM parameter. You need not change any existing file-storage code (because of the "internally-consistent" nature of how file-storage uses CR). This proc will be a nice addition to file-storage.

The other thing is that when you change storage type from database (the default) to filesystem, given that you already have some files on CR, it is true that some of your files will be in database and some in the filesystem. But is it really that critical? You have to backup both the filesystem and the database anyways when you use the filesystem. You could afford to "wait" for a more opportune time to do the storage_type conversion if you really want to have just one type of storage for all your file-storage files (or for performace reasons). Triggers are not necessary in this case, IMHO.

As for the file url, this is needed so that you have a nice default filename when you are actually downloading the file. Without the url, your default filename on downloads will be "index". I really haven't explored the performance issues, which you probably know more than me. But if indeed constructing the file url is the bottleneck and you don't seem to need it, removing it from the query should be ok.

It seems like this would be a good administrative option: switch
all files to the database, or switch all items to the filesystem. The
administrator will be switching it anyway, so doing it manually
wouldn't be that hard to do if the functionality was there.
Collapse
Posted by Jun Yamog on
Hi Jowell,

There is no problem with the reading.  Since CR is smart enough to figure it out as you have explained above.  The problem is writing to new versions.  Since it does not get the original storage type but rather it gets the parameter.  When CR reads it looks at the storage type so its not retrieving on the proper storage.

What I did was actually very simple.  I just read the storage type on the db rather than reading the existing paramater on version-add-2.tcl.  Then changed the a single "if" statement.  With this approach the file-storage is able to store on the file system and db at the same time safely.

It would really be great to have that proc to move all content to file system from db and vise versa.

With regards to the url it seems that a query on the view did not yeild any results for me.  And greping around yields no other code using it.  I maybe wrong since my file-storage is already modified.  I just use the cr_items.name column to give you the file name of the file, so I don't get the default "index" as the filename.

Maybe we should put in this quick fix in file-storage so it will still continue to work even though the admin changed the parameter.  Also if Yon's new file-storage will not make it into 4.5 final I think we should study the perfomance problem and my solution to it.  What do you think?  Who is currently maintaining file-storage now?  Do you think these fixes should be placed in the SDM?

Collapse
Posted by Jun Yamog on
I have uploaded a patch on SDM.  Apparently this is still a problem in 4.5.