Forum OpenACS Development: Limitation on length of bind variables
WHERE key IN (blah, blah, blah)
and the values in the IN clause were literals. The list of literals is limited by Oracle to 1000 items, so the code was breaking up the list into 1000 item chunks and invoking the query multiple times.
Unfortunately, you can't just replace the literal list of values with a bind variable; Oracle interprets the whole thing as one giant string and you're sunk. After doing some research online and looking it up in Tom Kyte's book, I came upon a solution. You do a little setup:
create or replace type myTableType as table of number;
create or replace function str2tbl( p_str in varchar2 ) return myTableType
l_str long default p_str || ',';
l_data myTableType := myTabletype();
l_n := instr( l_str, ',' );
exit when (nvl(l_n,0) = 0);
l_data( l_data.count ) := ltrim(rtrim(substr(l_str,1,l_n-1)));
l_str := substr( l_str, l_n+1 );
And then your WHERE clause looks like this:
WHERE key IN (select * from table ( cast ( str2tbl(:id_list) as myTableType)))
where id_list is the same comma-separated list of ids as before.
This runs much faster, but there's a catch I don't like. The length of the value in $id_list is limited to 4000 characters. That means, in this case, I can only have 500 items in the list and have to invoke the query twice as often. It's also more difficult to split up the ids into chunks since it's no longer just about the number of ids. This solution helps, but it would be even better if I could get rid of this limitation.
I can't find anything that says that bind variables are limited to 4000 chars, and I don't see any reason why I'm running into this limit. A varchar2 in PL/SQL can hold up to 32K, so it's not a limit of the function. Could it be a limitation of our Oracle driver?
The exact error is
ORA-01460: unimplemented or unreasonable conversion requested
And it happens any time the string in the bind variable is 4001 characters or longer.
I have tried implementing this with a temporary table instead; that is, inserting all of the ids into a temporary table and then the where clause becomes
WHERE key IN (select id from temp_table)
But this takes nearly twice as long to run.
Any suggestions on how to get around the apparent 4000 char limit on a bind variable?
Varchar2 in OCI is limited to 4000 characters. This limitation has not been lifted with Oracle 9.2.
(Bind variables can be longer otherwise, think of blobs or clobs.)
There is a maximum for IN lists anyway: it is 1000 literals.
Temporary tables are not a good idea: we tried this on AIESEC.net, the poor temporary tablespace got flooded with activity: You trade in parsing with hundreds of little statements per select query.
So the suggestion is to get rid of in list passing around at all
Unfortunately getting rid of the long IN lists would require a more substantial rewrite than I'm chartered to do, so I guess Tom Kyte's solution will have to do. It is fast enough now, I just worry how it will degrade as the lists get even longer.
I hope that Don's rewrite does away with this.
Since in your case it's much faster, that must meen that the PL/SQL overhead of converting the string is much lower than the time burned up in re-parsing the query with the embedded string literals every time? The only difference in your two versions of the query is the parsing of string litereals vs. parsing of a string to bind variables, right?
The str2tbl PL/SQL function above is basically a little custom special-purpose query parser, so at first glance, it seems odd to me that it would be substantially faster than Oracle's C-coded general purpose SQL query parser. Hm, must be that the SQL parser is doing much more stuff every time - trying to optimize the query plan from scratch, etc. - while the PL/SQL function is doing only one specific simple thing and not invoking all that other in-this-case-useless overheard of the query parser.
One way around the limitations on the length of an in list is to do a really ugly UNION ALL of thousands of "select 'FOO' from dual" statements in an inline view. AFAIK there is no limit on how many such subselects you do, so you can create in-lists of arbitrary size that way. Of course, that seems to make a rather heinously ugly query, but surpisingly enough it usually seems to work ok. However, in one case for a particular (non-OpenACS) query with 1500 integers on the in list, I recently compared that to using an Oracle tempory table, and the temporay table was consistently about twice as fast.
By "temporary table" I mean one definied like this:
create global temporary table foo (...) on commit delete rows;
I never tested many concurrent runs my query using a temporary table, as that was not relevent to our application. So if I had done so, I don't know whether I would have seen temp. tablespace contention like what Dirk reported for aiesec.net.
Dirk, was your aiesec.net temporary tablespace locally managed or dictionary managed? The default in Oracle 8i is dictionary managed, which generally sucks and thus is never what you want. (9i changed the default to locally managed tablespaces.)
Since temporary tables are Oracle's designed solution for this sort of query, I would not expect them to ever have significantly worse performance than other more hackish solutions, when Oracle is configured correctly. So if aiesec.net's Oracle instance was definitely configured correctly - locally managed temp. tablespace, etc. - and yet still had lousier performance when using a temp. table in the query than other seemingly uglier solutions, that would be particularly interesting. I'd like to know for sure. Dirk?
Actually, I don't see why values inserted into a temporary table would ever go to disk at all, while values inserted into a PL/SQL table would stay in memory. Perhaps that also is tweakable in some way? If doing a query that uses Jain'es str2tbl PL/SQL function above (possibly multiple times) turns out to really always be superior in performance to using temporary tables, for all in list sizes, that would be useful to know!
(Of course, as you all know and mention above, the best solution is to eliminate the in-lists entirely in some fashion, and instead pass along a specification defining how to do the query that created the big long in list in the first place. But temp. tables are still very handy in many cases.)
The "WHERE foo IN (list of literals)" approach is transformed into a bunch of OR expressions - which are always slow as hell in SQL implementations.
Now we may want to rewrite some of these portlets to operate, say, on an acs-rel that provides a mapping of a single object to the list of objects the portal is intended to operate on ... but that's neither here nor there as far as the portal package itself is concerned. Portals can be as smart or stupid as they want!
Andy, I should have been more correct. The hack on aiesec was like this: for every item of the in list, there was an insert into the temporary table. Hundreds of little inserts vs. one reparse - guess what won. (In a loop - gosh)
On ShareNet the code was smarter: there we used temporary tables very successfully. They are generally handy and a great tool, I agree.
AFAIK, if you already have, say, 2000 ids in an Oracle table somewhere
you can use "
insert into TABLE (select FOO from BAR)",
and if you already have the 2000 ids in PL/SQL you can use FORALL to
do a bulk bind and insert (rather than looping and doing 2000 separate
But if you have those 2000 literal ids outside of Oracle entirely, there are basically only two ways to get them in: Either do 2000 insert statements, or use the 2000 literal ids in one big honking query. There isn't any way to take 2000 literal values outside of Oracle and insert them all at once with a single SQL statement. If I'm wrong about that and there is, please let me know!
In the example where I tried using a temporary table above, I did do all 2000 inserts (from sqlplus) into the temporary table - that was the only way to get them in. (And actually, I didn't even use bind variables in the insert statements, although I could have!) For the case I tested, that (including all the inserts) ended up about twice as fast as sticking all the literal ids into a query of the general form:
I didn't try the "parse a string into a PL/SQL table" approach.select ... from table1, ( select 1 as id from dual union all select 2 as id from dual union all ... select 1500 as id from dual ) table2 where table1.id = table2.id
PG is not smarter about this, in fact it only recently got smart enough to do the pseudo-JOIN code for "WHERE foo IN (SELECT ...)".
When I tested the temporary table I did the same thing you mention - looped through the array and inserted each value (although I did use a bind variable to do it :). I found that I had to create my temporary table to preserve rows instead of deleting them, because when the table was created to automatically delete rows they were gone when I went to select them. So I had the additional overhead of deleting them afterwards.
I ran this:
SQL> exec runstats_pkg.rs_start; PL/SQL procedure successfully completed. SQL> begin for i in 1 .. 1000 loop insert into test values (i); end loop; end; / 2 3 4 5 6 7 PL/SQL procedure successfully completed. SQL> SQL> exec runstats_pkg.rs_middle PL/SQL procedure successfully completed. SQL> begin for i in 1 .. 10000 loop insert into test values (i); end loop; end; / 2 3 4 5 6 7 PL/SQL procedure successfully completed. SQL> SQL> exec runstats_pkg.rs_stop Run1 ran in 968 hsecs Run2 ran in 1201 hsecs run 1 ran in 80.6% of the time Name Run1 Run2 Diff LATCH.FOB s.o list latch 0 1 1 LATCH.dml lock allocation 1 0 -1 LATCH.list of block allocation 0 1 1 LATCH.ncodef allocation latch 1 0 -1 LATCH.session timer 3 4 1 LATCH.transaction branch alloc 1 0 -1 STAT...index scans kdiixs1 0 1 1 STAT...parse count (total) 5 6 1 STAT...shared hash latch upgra 0 1 1 STAT...opened cursors cumulati 5 6 1 STAT...bytes received via SQL* 940 939 -1 LATCH.sort extent pool 1 0 -1 LATCH.session switching 1 0 -1 LATCH.session idle bit 12 13 1 LATCH.channel operations paren 6 8 2 LATCH.session allocation 0 2 2 STAT...table fetch by rowid 0 2 2 LATCH.simulator lru latch 0 4 4 STAT...no work - consistent re 0 4 4 LATCH.Consistent RBA 3 8 5 STAT...buffer is not pinned co 0 6 6 STAT...enqueue requests 0 6 6 STAT...messages sent 0 6 6 STAT...enqueue releases 0 6 6 LATCH.lgwr LWN SCN 2 9 7 LATCH.mostly latch-free SCN 2 9 7 LATCH.row cache enqueue latch 0 8 8 LATCH.row cache objects 0 8 8 LATCH.checkpoint queue latch 178 192 14 STAT...calls to kcmgcs 5 19 14 STAT...cleanout - number of kt 5 19 14 STAT...consistent gets - exami 5 19 14 STAT...active txn count during 5 19 14 LATCH.undo global data 6 22 16 LATCH.enqueue hash chains 10 28 18 LATCH.enqueues 10 29 19 LATCH.redo writing 18 37 19 STAT...consistent gets 6 26 20 LATCH.messages 33 68 35 STAT...CPU used by this sessio 9 53 44 STAT...recursive cpu usage 6 50 44 STAT...CPU used when call star 9 53 44 STAT...free buffer requested 0 62 62 LATCH.cache buffers lru chain 1 64 63 LATCH.SQL memory manager worka 201 268 67 LATCH.simulator hash latch 128 646 518 LATCH.shared pool 1,083 10,081 8,998 STAT...execute count 1,005 10,006 9,001 STAT...calls to get snapshot s 1,001 10,003 9,002 STAT...redo entries 1,504 10,507 9,003 STAT...recursive calls 1,003 10,012 9,009 LATCH.redo allocation 1,510 10,524 9,014 STAT...consistent changes 1,512 10,576 9,064 STAT...db block gets 1,540 10,758 9,218 STAT...session logical reads 1,546 10,784 9,238 LATCH.library cache 2,115 20,115 18,000 LATCH.library cache pin 2,066 20,072 18,006 STAT...db block changes 3,029 21,175 18,146 LATCH.cache buffers chains 7,652 53,308 45,656 STAT...redo size 193,276 1,350,144 1,156,868 Run1 latches total versus runs -- difference and pct Run1 Run2 Diff Pct 15,078 115,563 100,485 13.05% PL/SQL procedure successfully completed.
(You can download the runstats package from here: http://asktom.oracle.com/~tkyte/runstats.html)
Note that 10.000 inserts take 10 times more latches. A latch is a serialization device in Oracle. The bottom line here is that the more queries per page you run, the more serialization you'll experience - a bad thing on a heavily loaded site. Add to that the overhead of our db_ API.
Also note that inserts into temporary tables generate redo. And redo is always written to the harddisk and thus causes I/O. As to temporary tablespace I/O rocketing up for AIESEC: that may have been caused by inappropriate sort_area_size settings or whatever. (It was a quick-fix situation...)
In ShareNet we used temporary tables as in "insert into temporary_table_name select values from somewhere_else where the result set was relatively small and used in a 2 or 3 subsequent queries and the CBO had problems with finding a decent access path.
So in a situation with concurrent accesses and ids outside the database it seems to be the best to use cursor_sharing=force.
This has an index on (element_id, key) so assuming new-portal is passing you the element_id as part of the config array (if not I'll make sure the rewrite does!), you can essentially say "package_id in (select package_id from portal_element_parameters where element_id = :my_element_id and key = 'package_id")" if the join-list is a list of package ids.
This ought to be faster than the "in (long list of constants)" and of course faster than creating a temp table filled with essentially the same information.
Eliza: if you want to attract attention, offer free Oracle or SQL Server licenses.
OpenACS: there should be some limit on reviving these old threads...maybe just reverse posting to blog-order...and maybe some indication that half a decade has passed since the last post?
Another idea: cross-link threads. This would require admin/maintainer input, but you could retire a thread, but keep it linked to current discussion on the same topic. Personally I think the ability to search is enough. Retire threads after a year or so.
In the age of Google, you should make old content harder to access, but still available. Maybe old threads should be rolled into an historical FAQ.
IMHO, the ability to dredge up and abuse old content doesn't help promote current OpenACS software and solutions.
All the OpenACS forums need is a small tweak to make the posting date more prominent; the current hard-to-read grey date color is unwise. Perhaps also distinguish very old posts with some sort of "n years old" message next to the date.
Abuse by spammers is a separate and entirely orthogonal topic.