Forum OpenACS Q&A: PostgreSQL Large Object functionality

Collapse
Posted by Titus Brown on
Hi,

I've added some large object functionality to the PostgreSQL
driver.  Diffs to the latest? version of the as-pg driver are
available at

http://www.idyll.org/~t/www-tools/

Ben (or whoever it is that maintains the as-pg driver package), you're
more than welcome to add it in to the main distribution if you think
it's worthy ;).  I don't think there's more for me to do on it
(except perhaps force transaction begin/end).

Best,
--titus

Collapse
Posted by Ben Adida on
Well, we don't want to use the current LOBs in Postgres because, quite frankly, they are horrible. We're trying to use varchar for text LOBs since with compression and larger blocks, we can easily get 32K of data into a varchar, and that's really all we need (a textarea in Netscape is limited to 32K). As for BLOBs, we're using Don Baccus's latest hack which "simulates" BLOBs. This is important because current LOBs don't get dumped in pg_dump operations, and Don's hack uses normal byte array fields which is much more interesting from a backup standpoint.

Now, your interface will most probably prove very useful once Postgres moves to interesting LOB functionality. I'll let Don comment on the details since he is "the man" for this kind of stuff.

Collapse
Posted by Titus Brown on
Understood - I would presume, though, that much of the same
functionality/interface (at least through the psql functions)
will be preserved?  (I wouldn't expect the C functions to
remain the same.)

I wrote this code because I needed to use it for a project I'm working on -- DNA sequences can go over a million bytes -- so I will happily
commit to adapting the code to newer versions of PostgreSQL, once
they're out.  (Since the project needs to be marginally stable I
don't want to commit to writing it for the beta versions of PG.)

I would also welcome comments on the interface - I'm relatively new at this stuff, and as long as I'm writing it I might as well put a bit more effort in and make it useful and extensible.

cheers,
--titus

Collapse
Posted by Don Baccus on
Titus ... your driver hack, along with my BLOB hack that was designed expressly for the ACS, are both dead-ends I'm afraid.  My hack ended up in the main AOLserver release of the Postgres driver, but if it hadn't I wouldn't've minded (and it's conditionalized with #ifdef ACS directives).  When 7.1 large types come out (see below) we'll modify the ACS to use them, and I'll rip my temporary hack out of the Postgres driver.  Because I knew it was a temporary hack to get us up and running, and no more, I only implemented the two elements of BLOB functionality currently used by the ACS and didn't bother to generalize the interface.

So I'd recommend against including your interface to existing PG large  objects.

They're both a dead-end because Postgres V7.1 will have true large types that will automatically be mapped to segmented tables (ala the current LO type, but efficiently rather than stupidly) when they're large enough.  These types can optionally be compressed, too.

If I get time to dig more deeply into Postgres internals this summer or fall, I plan to layer SQL92-compliant BLOBs on top of this new implementation of large types, for compatibility reasons (the native PG types will be far more useful for folks not worried about portability).

Unlike the current large object implementation, the new large types will be dumpable and restorable (I'll make sure of that).

Current large objects suffer from a terrible problem, they needlessly use the OID as the key for each object.  This is the main reason you can't dump/restore the things.  Someone recently wrote a hack that allows dumping, non-synch'd with the rest of your database.  As far as I'm concerned, though, if you need to knock down all of your users to make a consistent backup you might as well punt and just tar the database directory (after knocking down the postmaster).

You are doing something like this to make your backups, I hope?  I.E. you realize they're not dumpable/restorable and that you gotta knock down your postmaster before you backup the database from the filesystem?

Collapse
Posted by Don Baccus on
I guess I could be a little more specific...

Titus, the reason I say that your driver addition will be a dead-end is that the new 7.1 large types won't require the use of special lo_ calls in order to use them.

There WILL have to be ways to insert and update binary fields, and current functionality for byte arrays suck.  I'm presuming this will be improved because real large objects will be based upon them.  I may offer to help out if necessary, I've worked with Jan Wieck on referential integrity, it was fun, and he's the implementor for the new large objects.

When this functionality is added, we will need some driver work to stuff binary data into the database using the new large types.  Our mutual efforts are both dead ends because the code will be new.

New and SIMPLER than either of our efforts...

Collapse
Posted by Titus Brown on
<humor>

And in PostgreSQL 8, there will be a feature....

</humor>

I truly appreciate the work being done on PostgreSQL, and on the various future extensions that it will support, but it's difficult for me to tell my employers that they have to wait 6 months or 2 years or however long it will take PostgreSQL to support good large objects, before they can have their Web site.

And, even if I was capable of helping with the PG stuff, which I'm not, I'm supposed to be working on the Web site, not the database ;).

I would argue that the development of the driver should partly focus on supporting current features in the database so that non-C-hacker-type-pople can use the database to the fullest.  Tying the addition of new features in the _driver_ to the _future_ _reimplementation_ of the feature in the database (however good and necessary that reimplementation will be) isn't going to attract developers, because they need the functionality *now* and will accept the shortcomings.

(Brings to mind that poor guy on the PG mailing list who was using PHP to split his text fields into 6k chunks to save them ;).

I eagerly await the new implementation of large objects -- and my employers are very happy with my current implementation of their Web site.

Cheers,
--titus

P.S.  I now completely agree that these extensions have no place in future pg-as distributions!  But I don't see any harm in my making them available ;).

Collapse
Posted by Titus Brown on
And, incidentally, I had no idea that there was Large Object functionality in the postgres driver distributed with AOLserver!
Is there some forum or archive that I'm missing out on??

sigh.

--titus

Collapse
Posted by Ben Adida on
You're welcome to use the current large object functionality, for sure. We're just saying that the current functionality is poor. There is no backup support. Each LOB is stored in an individual file. That makes for poor scalability. We've chosen to hold off until a better LOB system comes along...
Collapse
Posted by Don Baccus on
And I'm certainly not saying YOU shouldn't use your changes, simply saying that I don't think the driver enhancements should go into the standard driver release.

The only reason I didn't add real BLOBs to PG rather than just hack the driver is that I know real large types are on the horizon...

Collapse
Posted by Don Baccus on
(the YOU in my previous message refers to Titus...)