Forum OpenACS Q&A: Insert Performance Issues - PG 7.2.3/OpenACS 4.5/perl/dbi
I am currently developing a data aggregation system running on PostgreSQL 7.2.3 and OpenACS 4.5 that takes inputs from external sources (subscription business databases). I decided to go with OpenACS because of the user interface capabilities which I felt would come in handy down the line.
At this point, I have:
a inherits from acs_objects
x inherits from a
y inherits from a
using the example of the Notes application.
My issue is that when I do big data acquisitions, I drop the indices, vacuum the database (with no switches) and it takes 6-7 seconds to do an add (ie. x__new), which I think can be improved upon.
I am currently pulling in data using perl/dbi on a RedHat 8 machine, P3/600 with 384MB and an old 10GB 5400rpm drive.
I was thinking...
1. When I did the install, I was only running with 128MB RAM, it is now 384MB. Does Postgres require reconfiguration of some sort to take advantage of this? I skimmed the PostgreSQL admin manual and didn't really find anything interesting about caching.
2. When I do a 'ps', it tells me that it is using roughly 82% of the processor. Is it time to upgrade the CPU? Is it time to go dual processor, possibly?
3. The hard drive is older and I just ordered a 7200rpm 80GB drive. Will that help me significantly?
4. x__new calls a__new calls acs_object__new, which is fairly expensive, I would imagine. Would it be wise to consider using the PostgreSQL inheritance instead?
Anybody have any experience with this type of thing?
1. increase the WAL files setting. I think the usual setting is 0. I usually increase it to 3 or 4.
2. increase the sort_mem from 512 to 2048; this means "use up to 2MB RAM for in-memory sorting before dumping the data to disk and using the slower disk-based sorting routine". 512KB is not enough.
3. Check the fsync parameter - this varies by platform. Turning it off is less safe but faster.
4. Increase the shared_buffers setting.
5. Make sure you have some checkpoint_segments (number greater than 0) .
Remember to restart the server after each change and test. If using the pg_ctl program, check the actual logfile, since it has the bug that if the server exits due to a misconfigured .conf file it will still report success.
With 384MB of memory you can increase sort mem size well above 2MB, even, though that suggestion is much better than the default.
But the shared buffer pool must be greatly increased. It's mentioned above but the importance isn't stressed. Postgres, by default, only caches 64 blocks in shared memory and with that much memory you should increase the size to a much larger value.
Randy's suggestions are good, too - don't make things objects unless they need to be, which normally implies you need the auditing information built into objects (creation time, user, etc), you need to explicitly manage permissions on the object etc.
We don't use Postgres inheritance because it is broken, in essence. There's a document lying around in file storage that explains why, the reasons are many.
After thinking about it some, I think that at least one of my data types doesn't need to be an acs_object. I appreciate the warning regarding inheritance.
As I was trying to build a new data model to shift everything over, I decided to look up the acs_object in the schema browser.
create sequence t_acs_object_id_seq;
create view acs_object_id_seq as
select nextval('t_acs_object_id_seq') as nextval;
create table acs_objects (
object_id integer not null
constraint acs_objects_pk primary key
Indicates that a sequence is created when supposedly, it is possible to do this using the "serial" and "bigserial" datatype. Do the serial/bigserial datatypes work?
I tried to create a table:
create table test (
and then attempted an
insert into test values (null, 2);
which fails. I had been most familiar with mysql and the "integer auto_increment" specification where that generally works.
Can anybody briefly describe why using serial/bigserial is a bad idea (ie. like using PG inheritance?)
We use sequences in our PG code mostly because we support both Oracle and PG, and using sequences leads to our being able to share a lot more queries between the two than would be true if we used the serial datatype in PG. Our datamodels for Oracle and PG are more closely alike, too, as a consequence of this decision.
Sequences are also more flexible in practice, and PG's serial datatype is implemented using sequences.
In regard to your example, it has been so long since I've experimented with the serial datatype in PG that I don't remember the details of how it works ...
I've created a table:
create table test2 (
I'm trying to build a new function, "test2__new" which will insert a new string into the table and automatically give it a new i value.
create function test__new (varchar)
return integer as '
new__text alias for $1;
insert into test2
end; ' language 'plpgsql';
Whenever I try to load this new function, it always gives me "parse erorr at or near "return"". I am not sure what I'm missing.
Also, I'm trying to find some code that exports an SQL query to Excel/CSV via the OpenACS interface. I did some searching and found that it is supposed to be a feature of the simple-survey module of dotLRN. After downloading dotLRN 1.0, I haven't really been able to isolate it. Can anyone suggest where I might be able to find this type of functionality?
You need to declare v_index_id in your function, like this:
(picks up the type from table2's i column automatically)
Other than that I don't see any error in your function but I could be wrong, of course ...