Forum OpenACS Q&A: to blob or not to blob

Collapse
Posted by Jonathan Ellis on
I want to allow users to upload photos a la the portrait code, only on a large scale. (I have ~50 MB of photos I want to put on the server, and that's just me!) However, looking at the portrait code made me wonder if the current blob implementation is the way to go and looking at threads like this made me wonder even more.

So, what kind of a performance/storage efficiency hit am I looking at if I go with the current hack? Or, is there an AOLserver driver for 7.1 beta that supports blobs?

Also, what are the objections to simply storing images on the filesystem rather than the DB? (Seems to me that if I do it that way it's a lot easier to allow the owner to say, "I want 160x120 thumbnails instead of 80x60" because then I can just re-run my imagemagick script instead of writing a temporary file and then re- writing to the DB for each one. And tarring up an images/ directory doesn't make the backup script significantly more complex, which is the only objection I can think of.)
Collapse
Posted by Vinod Kurup on
Well, I'm not expert enough to comment on the efficiency of the blob implementation, but the acs photodb module stores photos in the filesystem. I ported it to OpenACS and placed it here.
Collapse
Posted by Don Baccus on
PG 7.1 can back up the built-in large object type.  While there are now infinite-length rows there's no binary interface to such rows, so they're really best thought of as being "clobs".

So ... the current large-object hack in the PG driver has a bit of life in it.  The efficiency hit doesn't seem to be that bad compared to what you'd get if you didn't need the driver hack but still stored the binary data in the database.

Here's a filesystem vs. db pro-con argument:

In favor of the filesystem:

1. More efficient uploading and downloading.

In favor of the database:

1. The current driver hack gives you pg_dumpable tables.  So your
system backup strategy is easy - just dump the database and keep your script sources in cvs on another machine that's also backed up.

If you put blob data in files, you need to backup them as well as your  database, and guaranteeing consistency is problematic.

So I've been thinking about the possibility of supporting both in the ACS 4x content repository.  The object structure should make it easy to do so.  The notion is that a site that expects to do relatively little with files, portraits, photos etc might prefer the "all I need is pg_dump to make a consistent backup" philosophy that come with the in-database storage strategy.

On the other hand, there's no doubt that any site that does a high volume of file upload/downloads will want an in-filesystem strategy, due to the much higher efficiency of that approach.

Collapse
Posted by Jonathan Ellis on
Thanks for the responses.  Looks like I will stick with my filesystem, a la Vinod's code...
Collapse
Posted by Carl Coryell-Martin on
We here at civilution have been doing an extension of the photodb system for the OpenACS (we are adding support for collaborative photo galleries) and after hearing stories of issues with sharing large files in the file storage module we have gone with the file system.  (email us if you want to play with the code).  It seems to work just fine.
Collapse
Posted by Don Baccus on
For the record, the thread on the file-storage issue ended up uncovering a query doing a cartesian product due to an error doing the  port.

Not saying the file system approach is wrong (read my other post), but  thought I'd point this out.

Collapse
Posted by Carl Coryell-Martin on
Cool, thanks for the info on filesharing, I suppose that we should upgrade...
Collapse
Posted by Janine Ohmer on
On the other hand, there's no doubt that any site that does a high volume of file upload/downloads will want an in-filesystem strategy, due to the much higher efficiency of that approach.

Seeing as it's a year later and we now have Oracle in the mix, is this still true? Has anyone done any benchmarking of the two approaches?

I want to know because we have a client with a 12 Gb (and growing) Oracle database that consists mostly of uploaded files. I'm trying to talk them into moving them into the filesystem, mainly to make backing it up easier but if it will be faster too, so much the better!