Forum OpenACS Development: Re: Hash value for files

Collapse
2: Re: Hash value for files (response to 1)
Posted by Don Baccus on
"Additionally the 1:n relationship between cr_items and cr_revisions
has to be amended to support a n:m relationship (one revision can belong to
multiple items)."

How do you propose to do this without breaking everything that already exists?

Simpler would be to allow multiple revisions to point to the same file (if LOBs are stored in the file system).  The PG BLOB hack already allows for multiple columns to point to the same BLOB ... not sure whether Oracle's LOBs support this.

I still question whether this functionality is useful enough for enough people for us to implement it as part of our core functionality.  File systems the world over for a large variety of operating systems manage without it, and indeed, if  this were of high importance I would think the file system would be the place to implement it.  If the application layer, not filesystem, is the "right" place to implement it because it is of limited use, then the same argument applies to our core "file system" (content repository) service, doesn't it???

An additional point:

"The calculation shall happen while uploading and should be possible to be
done on only part of the file. This shall prevent a 5MB file to be uploaded
by the user just to realize the whole bandwidth was not necessary as the
file was already there."

Wouldn't this require support for partial file upload from AOLserver?  AFAIK when you push "submit" on a file upload form, AOLserver slurps the entire file onto your server before OpenACS can intervene.

Collapse
3: Re: Hash value for files (response to 2)
Posted by Jeff Davis on
I do think storing binary content keyed by hash is a good idea but I would rather implement reference counting which was trigger maintained than do some complicated view for garbage collection.

I think the idea that you could only upload a little bit of a file by checking the hash on the beginning is a bad idea. What if I have a large docbook document where I edit the afterword, the beginning will match and the end will be different. And as Don mentioned, there is not a way without changing AOLServer itself to stop an upload like that. The only way really to implement this is to provide client side software which computes the hash on the whole document locally then only uploads if it's changed. Something like that would be very nice especially in the context of something like photobook where you might want to sync your local photo collection to the server periodically.

Oh, and I think we should call the modules to do all this "OpenrsyncACS" and "OpentorrentACS".