Forum OpenACS Improvement Proposals (TIPs): Re: TIP #94 (proposed): Add sha1 hash to cr_revisions

Collapse
Posted by Jeff Davis on
If you do the sha1 in tcl you need to be very careful about not disturbing the encoding. In fact I think using ns_sha1 is not workable since it requires the file to be in memory and for things like video they will be too large. It would be necessary to either add a ns_sha1_file command or use an external program for this to work at all. Adding the function to nssha1 would not be that hard but it would mean dependency on the new nssha1 version (although you could introspect if the function was available and just not generate it if not).

Also, you can't compute it until the user uploads the file so it you can;t avoid the transfer (your "further changes" point 1) although once transfered you could offer to symlink.

Andrew and I looked at how much duplicate content there was in the sloan file storage and it looked like less then 20% (by byte count) was duped so I am not entirely sure it's that big a win.

running sha1's over all the cr content would probably be pretty fast even for a reasonably large site. I have done it for 100gb or so on my desktop machine and it only took about 4 hours iirc.