Forum OpenACS Q&A: PostgreSQL Large Object functionality
I've added some large object functionality to the PostgreSQL
driver. Diffs to the latest? version of the as-pg driver are
Ben (or whoever it is that maintains the as-pg driver package), you're
more than welcome to add it in to the main distribution if you think
it's worthy ;). I don't think there's more for me to do on it
(except perhaps force transaction begin/end).
Now, your interface will most probably prove very useful once Postgres moves to interesting LOB functionality. I'll let Don comment on the details since he is "the man" for this kind of stuff.
functionality/interface (at least through the psql functions)
will be preserved? (I wouldn't expect the C functions to
remain the same.)
I wrote this code because I needed to use it for a project I'm working on -- DNA sequences can go over a million bytes -- so I will happily
commit to adapting the code to newer versions of PostgreSQL, once
they're out. (Since the project needs to be marginally stable I
don't want to commit to writing it for the beta versions of PG.)
I would also welcome comments on the interface - I'm relatively new at this stuff, and as long as I'm writing it I might as well put a bit more effort in and make it useful and extensible.
So I'd recommend against including your interface to existing PG large objects.
They're both a dead-end because Postgres V7.1 will have true large types that will automatically be mapped to segmented tables (ala the current LO type, but efficiently rather than stupidly) when they're large enough. These types can optionally be compressed, too.
If I get time to dig more deeply into Postgres internals this summer or fall, I plan to layer SQL92-compliant BLOBs on top of this new implementation of large types, for compatibility reasons (the native PG types will be far more useful for folks not worried about portability).
Unlike the current large object implementation, the new large types will be dumpable and restorable (I'll make sure of that).
Current large objects suffer from a terrible problem, they needlessly use the OID as the key for each object. This is the main reason you can't dump/restore the things. Someone recently wrote a hack that allows dumping, non-synch'd with the rest of your database. As far as I'm concerned, though, if you need to knock down all of your users to make a consistent backup you might as well punt and just tar the database directory (after knocking down the postmaster).
You are doing something like this to make your backups, I hope? I.E. you realize they're not dumpable/restorable and that you gotta knock down your postmaster before you backup the database from the filesystem?
Titus, the reason I say that your driver addition will be a dead-end is that the new 7.1 large types won't require the use of special lo_ calls in order to use them.
There WILL have to be ways to insert and update binary fields, and current functionality for byte arrays suck. I'm presuming this will be improved because real large objects will be based upon them. I may offer to help out if necessary, I've worked with Jan Wieck on referential integrity, it was fun, and he's the implementor for the new large objects.
When this functionality is added, we will need some driver work to stuff binary data into the database using the new large types. Our mutual efforts are both dead ends because the code will be new.
New and SIMPLER than either of our efforts...
And in PostgreSQL 8, there will be a feature....
I truly appreciate the work being done on PostgreSQL, and on the various future extensions that it will support, but it's difficult for me to tell my employers that they have to wait 6 months or 2 years or however long it will take PostgreSQL to support good large objects, before they can have their Web site.
And, even if I was capable of helping with the PG stuff, which I'm not, I'm supposed to be working on the Web site, not the database ;).
I would argue that the development of the driver should partly focus on supporting current features in the database so that non-C-hacker-type-pople can use the database to the fullest. Tying the addition of new features in the _driver_ to the _future_ _reimplementation_ of the feature in the database (however good and necessary that reimplementation will be) isn't going to attract developers, because they need the functionality *now* and will accept the shortcomings.
(Brings to mind that poor guy on the PG mailing list who was using PHP to split his text fields into 6k chunks to save them ;).
I eagerly await the new implementation of large objects -- and my employers are very happy with my current implementation of their Web site.
P.S. I now completely agree that these extensions have no place in future pg-as distributions! But I don't see any harm in my making them available ;).
Is there some forum or archive that I'm missing out on??
The only reason I didn't add real BLOBs to PG rather than just hack the driver is that I know real large types are on the horizon...