Forum OpenACS Q&A: Any way to speed up inserts of multiple rows?

I am getting a speed of about 5 msec per insert just calling db_dml {insert into foo...} once per insert. Is there any way to speed up inserts of many rows to a table? I can batch up groups of inserts in my application, but I am trying to figure out if there is any way to get a postgres client to insert faster. Note that I am not restricted to AOLserver's postgres driver, I could use Perl DBI if that would speed things up somehow. The data is being collected by a separate daemon process anyhow. Thanks
Collapse
Posted by Don Baccus on
I don't think it's the AOLserver driver, as it is very lightweight.

I get faster inserts in Oracle vs. PG when doing such tests as migrating ACS 3 forum contents into OpenACS 4.  Much faster (in Oracle I can migrate the openacs.org forum contents about 3x faster than in PG) so I think this exonorates the db_dml Tcl code to some extent, at least.

So that leaves us wondering about PG's speed, right?  Something you might do is to check to see if the machine's CPU or I/O bound.  If I/O bound try things like moving the write-ahead logs to a different disk, index files to a different disk, etc.  You might try turning off fsync for testing purposes as this gets rid of a lot of file I/O.  This would give you baseline data on insert speed (without as much safety, but it would help to pinpoint where your problems lie).

Look into the COPY command, since you can bunch of the data.  COPY is known to be *much* faster than INSERT.

Oh, yes, before doing that even wrap the bunched-up INSERTs in BEGIN/END to make them all happen in a single transaction.  This won't speed things up as much as it did in <PG 7.1 but should still speed things up some.

But the fastest thing to do is COPY.

These are just some simple, top-of-the-head ideas.

Collapse
Posted by Henry Minsky on
Thanks for the bits, Don. Putting everything into a single
transaction was good for a factor of two or more in my case.

I'll look into the COPY command. I think the Perl DBI interface
has a lot less CPU overhead than preparing SQL statements in
ACS/Tcl, so that is also a factor. I am getting something like 500
rows/second inserted via OpenACS, vs. 2000 per second using Perl DBI
(on a 1.4 GHz Linux box with single IDE drive).

Luckily, this is pretty much fast enough for what I'm trying to
do for the moment.

Henry, what command sequence are you using with Perl DBI?  I ask b/c your comparison between ACS/tcl & Perl DBI is obscured by Perl DBI being such a mis-mash of backwards compatibility and misfeatures that everyone uses it differently.  I'd like to see where and how you get such a boost.
Collapse
Posted by Don Baccus on
We do substitution on SQL queries in TCL before passing them off to the driver, that could be one source of the difference.  You could turn this off easily enough by finding the "subst" command that does this.

We also do massaging of Tcl variables passed in as pseudo-bind vars, double-quoting and escaping backslashes.  That can be avoided by using literals directly.