Forum OpenACS Q&A: Re: referencing a specific db pool
This is something that is just crying to be done in pl. I moved a select heavy procedure from tcl to plsql and saw the time go from 8 hours to 8 minutes on a process we wanted to run ever two hours, and also greatly increased the functionality. It also really lowers the load on the server.
But there isn't anything really bad about a ten minute query if it is only run once a month, and you are only able to lower it to four minutes.
might consider doing it all with a COPY command (you can
look at db_load_sql_data and ref-timezones for examples
of loading data with copy).
I think what I did was setup a controller function and then another worker function which ran each operation. It had to be done this way for the similar reasons: user data might crash the process, but in that case only a single record would be affected. So in the worker function, catch the exception, and return 0 or 1 based on success/failure. On failure, run another function to do your logging.
This is always the case in pl, even if you can catch an exception, you cannot continue in the body of the function and you have to cleanup and return.
Moving stuff around inside the database is always going to be much quicker than going back and forth to the webserver for control.
I know it works that way in plsql, and plpgsql is very similar, you add an exception block to the bottom of the function, something like:
EXCEPTION WHEN OTHERS THEN return 1; END;
WHEN OTHERS clause handles all exceptions not otherwise handled. If you exect certain types of errors and want to distinguish between them, you need to include them first.
If you have several dml statements that need to be executed, and you want as many of them to succeed as possible, you can follow the advice of _Oracle PL/SQL Programming, page 266-267 and have each statement in its own block:
BEGIN BEGIN DELETE FROM employee WHERE ...; EXCEPTION WHEN OTHERS THEN NULL; END; BEGIN UPDATE company SET ...; EXCEPTION WHEN OTHERS THEN NULL; END; END;
Using this method, you could do away with the second function, and use a sub-block to do the work:
LOOP BEGIN ...; EXCEPTION WHEN OTHERS THEN NULL; END; END LOOP;
This works in pl/sql, you might try a simple example in pl/pgsql.
Postgres also has no exception handling - you can run a transaction on exception to log it or similar, but you cannot prevent the rollback.
Now that I have added the second (simpler but more data) part of the end of month process, processing time has blown out to 30 mins - and the amount of data will increase as sales increase.
I am going to try parallelizing the two which should reduce the overall execution time.
Or, I guess I just abandon my idea and manually enforce the business rules as well as allowing constraints to do their thing and catch the errors.
Such a nice idea though...
Here are the results of some benchmarks we did inserting 30k rows into a table, using "time psql -f blubb.sql -q dbname": File kingfisher skate 30kinsert.sql 39.359s 762r/s 335.024s 90r/s 30kcommonce.sql 11.402s 2631r/s 7.086s 4233r/s 30kwithids.sql 10.138s 2959r/s 6.936s 4325r/s 30kprepare.sql 8.173s 3670r/s 5.189s 5781r/s 30kdump.sql 1.286s 23328r/s 0.785s 38216r/s 30kdumpseq.sql 1.498s 20026r/s 0.927s 32362r/s Kingfisher is the single processor machine I mentioned yesterday night, skate the SMP machine. The table has five rows (bigint, bigint, double, double, timestamp without time zone). The first of them has a "default nextval('sequence' ::text)" definition, and there are no further constraints or indices. The 30kinsert.sql uses simple insert statements with autocommit on, and we insert all but the first column which is filled by the default sequence. With this test, kingfisher seems to have an irrealistic high value of commits (syncs) per second (see what I wrote yesterday) , skate has a more realistic value. 30kcommonce.sql, as suspected, gives a rather high boost by encapsulating all into a single commit statement. 30kwithids gives a small boost by inserting pre-calculated sequence numbers, so it seems not worth the effort to move this logic into the application. 30kprepare prepares the insert statement, and then issues 30k EXECUTE statements within one transaction, the speedup is noticeable. 30kdump simply inserts the 30k rows as a dump via COPY FROM STDIN. (as with 30kwithids, the first column is contained in the insert data, so the default value sequence is not used). Naturally, this is by far the fastest method. 30kdumpseq.sql uses COPY, too, but omits the first column and such utilizes the sequence generation again. This gives a noticeable 15% slowdown, but seems to be still fast enough for our purposes. Sadly, it is not available within jdbc. Thanks for your patience. Footnotes:  We suspect this to be some strange interaction between ide, cryptoloop and ext3fs, so that the sync() call somehow does not really wait for the data to be physically written to the disk. (I really can't imagine a crypto-looped notebook harddisk to do more syncs/second than a SCSI-Based RAID in a server machine. We did some small benches on the sync() / fsync() calls that seem to prove this conclusion.) -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich