Forum OpenACS Q&A: Response to What is the best way to store long texts in PostgreSQL?

Adding to what Jun said file storage is just a relatively lightweight wrapper around the CR. Check out cr_import_content in acs-content-repository/tcl/revision-procs.tcl. It only handles stuffing in file content into the file system or a large object (LOB in Oracle, PG driver hack in PG). It is assumed the file content is of arbitrary type so can't be stored directly as text or CLOB.

There is no corresponding utility to import a known text file or form entry as CLOB or text (the CR allows for storing text directly in PG). There should be, though. The cr_import_content utility itself should handle known text files (i.e. pass in a switch that tells it the file's known to be text). And an additional utility should be written to take a Tcl string and stuff it in the CR.

In general I'd say store known text as text unless you store items in the file system rather than the db. In Oracle, it's still stored as a LOB. In PG storing known text in the revision's text column should be considerably more efficient than using the driver LOB hack. The LOB hack uuencodes content (which is considered to be arbitrary binary content) on input and uudecodes on output, which is relatively expensive. The encoded data is 33% longer than the original, too. It does this because there's no general PG API for inserting arbitrary binary data outside of the rather strange large object implementation which had lots of problems when we started this project (at the time you couldn't make a consistent dump of your db if you used PG large objects, sort of a killer and that was just one problem).

As far as the psql seg fault ... please report that to the PG folks!

I'd recommend making a folder for the bug tracker instance (perhaps every project in the bugtracker instance?). Why? We've been talking about hooking webdav up to the CR, for one thing, and this would allow for remote browsing and viewing and (most importantly) editing of content outside of a browser.

The CR is admittedly awkward to use but isn't that bad once you play with it some. General routines like cr_import_content help a lot.

In 4.7 we should remove fs_root_folders and move the package id into cr_folders. If not objects themselves. The question "which package created this folder?" or "which package created this object?" are increasingly common ones (notifications needs the latter, for instance) ...

In 4.7 I also want to expose a bunch of the existing CMS "build content" Tcl API at a higher level and Lars has talked about even more ambitious efforts to make the CR (and objects in general) easier to use.