Forum OpenACS Development: Eliminating 90% of DB-specific queries

1: Eliminating 90% of DB-specific queries

Posted by Lars Pind on 02/21/03 04:41 PM

I've been thinking that 90% of the queries that are different between Oracle and PG are because of function calls, which use the package.method notation in Oracle and package__method notation in PG.

I know that this is a slippery slope, but if we could have the query dispatcher automatically do this for us, we could probably eliminate 90% of the DB-specific queries.

Then, of course, the next step would be to automatically deal with things like 'from dual', and with 'now()' vs. 'sysdate', and similar things as well.

Is it worth going down this path?

/Lars

2: Re: Eliminating 90% of DB-specific queries (response to 1)

Posted by Jeff Davis on 02/21/03 05:05 PM

We talked about this before here and in the weekly chat.

I think it is worth a try. I like doing the mapping in the QD; there are a few functions where there might be a few collision on the oracle side ( ttl__create_file, some packages called *__sws, etc) but not a lot of work at all to do.

4: Re: Eliminating 90% of DB-specific queries (response to 1)

Posted by Dave Bauer on 02/21/03 05:14 PM

Sounds good to me. I like the QD doing the work also. I suggest using current_date or current_timestamp for postgresql, just in case Oracle decides to use the standard notation at some point.

3: Re: Eliminating 90% of DB-specific queries (response to 1)

Posted by Eduardo Pérez on 02/21/03 05:16 PM

What do you mean?

Adding PL/pgSQL to Oracle or adding PL/SQL to PostgreSQL?

Could anyone with experience in both languages comment the benefits and
drawbacks of either one?

5: Re: Eliminating 90% of DB-specific queries (response to 3)

Posted by Jeff Davis on 02/21/03 05:22 PM

the safest thing is to map __ to . in oracle queries. Trying to map . to __ but only when it is a function and not a table.column reference is _hard_.

It is not entirely risk free but I think we would see fewer bugs because of this than we now see because we have two
parallel sets of queries when one would suffice.

6: Re: Eliminating 90% of DB-specific queries (response to 1)

Posted by Tom Jackson on 02/21/03 06:42 PM

Has anyone used my query-writer? It started as an attempt to eliminate writing any queries at all, no .xql files for insert/update/delete. Unfortunately it has a bad ui for adding new functions. I started working on an update last week and hope to complete the project in the next few weeks. What it will do is take a table name, one already created in the db and figure out what the insert/update/delete pl code would look like and create it. It figures out if the table pk traces back to acs_objects and if so, uses a function call to create the parent object, then inserts data into the new table. For updating data, It might use regular dml statements and avoid pl/sql altogeather. The package will have an expanded capability to define your own 'tweaked' pl code.

It has built in capability to distinquish between pg and oracle, and would run the correct procedure for creating the pl/dml on the fly.

It is currently pg specific since the main focus was to eliminate the need to remember attribute order, attribute defaults, and even the number of attributes to a function call. It can handle overloaded pg functions without the need for the developer to be aware of which function is being used.

It has an api that can be used in an .xql file, one that can be used in a tcl file, and a url based api as well.

The current package is at http://zmbh.com/query-writer/

7: Re: Eliminating 90% of DB-specific queries (response to 1)

Posted by Don Baccus on 02/21/03 08:18 PM

"from dual" is already dealt with via the "dual" view, as is "select sysdate from dual" for simple cases. The view trick is also used so you can say "select nextval from sequence_name" in PG (though you have to be careful that it's only called once in your query)

However - date semantics are totally different. I don't see much hope there until Oracle tracks the SQL99 date standard (PG is finally getting close to standard in this regard and 7.3 may finally be there)

The differences in the PL/[pg]SQL calls are much different than merely the package vs. non-package name convention.

In Oracle we make heavy use of default values and named parameters in function/procedure calls. Can't have one without the other. Default values are very useful IMO and I'd hate to see us give them up in our Oracle source just because PG doesn't support them.

And on the PG side at times we've had to disambiguate with explicit casts (rarely) or inventing new overloaded function defs (more commonly) when writing several shorthand forms of functions or functions that do slightly different things (like in the CR regarding file/lob/etc storage.) The problem we have is that the bindvar hack in the PG driver surrounds values in quotes which means that PG has no idea what the type is and essentially picks the function that has the right number of params. The bindvar hack must work this way, too, to give us the safety against SQL smuggling we seek (forces a type conversion with error checking.)

We thought about doing the name mapping (__ vs. .) when we first started porting code and these were some of the reasons we decided not to do it.

Now ... we did speak earlier about getting a lot of leverage by simply defining permission_p() for permission checking. This function has no default params in Oracle therefore the named parameter style isn't necessary. And everyone knows what the three parameters are for, anyway, since its use is so pervasive.

And getting rid of silly things like get_live_revision() vs. cr_items.live_revision helps too.

The CR has auto-SQL generation code that does stuff similar to what Tom's mentioning for content types, though it's not used outside the CMS. I've thought about adopting the CR's view trick for general acs_objects. That would make doing something like Tom mentions - autogenerating DML for inserts and updates - easier.

That would get rid of a lot of the PL/[pg]SQL entirely and, much like the new Tcl API for service contracts, would simplify code.

What do people think about trying to attack the problem by reducing dependence on PL/[pg]SQL?

8: Re: Eliminating 90% of DB-specific queries (response to 7)

Posted by Jeff Davis on 02/21/03 10:08 PM

I checked and of the 792 -oracle.xql files only 180 had named parameters in any of the queries (and of the 180 with named fields, the majority were plsql calls to .new methods). I think being able to get rid of 612 essentially duplicated files would be a good thing and I think if we made the CR style content creation stuff easier to use then we could get rid of a lot of the remaining 180.

9: Re: Eliminating 90% of DB-specific queries (response to 1)

Posted by Tom Jackson on 02/21/03 11:01 PM

The 'group type' creation code generates plpgsql and stores the necessary __new methods to create new groups of the type. Probably it does the same thing for oracle, so the differences between oracle and pg should be outlined in this code. The code isn't very general, and fails when your group type has more than a few attributes (unless you up the allowed number when you compile pg). My updated query-writer should produce code similar to what the group type code does, but it will write the result to a file instead of loading into the db.

To point out what Don has said: there is more than mapping __ to ., and the niceties of oracle should not be thrown away to make a query look like a pg query. We should be trying to make pg queries look like oracle queries. That requires some method of specifying defaults for attributes, specifying if an attribute can use a default, etc.

A fix should consider future development more than current code, since the current code already works. I wrote a package with over 500 files (only a few xql files, added before I finished query-writer), porting to oracle would only require .xql files for the select queries.

Here is an example my api used in the x-postgresql.xql files:

<querytext>
select [qw_write_fn supplier__new {
    party_id => :party_id
    name => :name
    address => :address
    phone => :phone
    email => :email
    contact => :contact
    creation_user => '1234'
    }]
</querytext>

The missing but required attributes are filled in with defaults. You can change the defaults by supplying a static value.

New development wouldn't use the .xql api, instead, you would use the tcl api:

set supplier(party_id) $party_id
set supplier(name) $name
set supplier(phone) $phone
set supplier(address) $address
set supplier(email) $email
set supplier(contact) $contact
set supplier(creation_user) "'1234'"

qw:new supplier supplier

Changing to oracle would require no changes in the tcl file, and no .xql files. I'm not recommending query-writer, just the concept of storing attribute information on an object basis and creating queries based on that information, not developer labor.

10: Re: Eliminating 90% of DB-specific queries (response to 1)

Posted by Don Baccus on 02/23/03 03:54 AM

Jeff, you're working from a false premise ...

Just because 612 query files don't contain named parameters doesn't mean they don't contain outer joins or date arithmetic or decode or other things that don't map directly into a common query form.

Now .. Oracle 9i *finally* supports standard SQL 92 outer join syntax. If we can pick a date for retiring support of 8i we can then use common queries for these.

I don't know what the state of AQL-compliant date support is in Oracle but date arithmetic can't be done in a common query at the moment (there are special cases that can be done commonly)

Oracle 8i supports a limited form of the CASE statement so some decodes can now be removed and queries shared with PG. But they don't support the most useful form which allows full removal of decode (CASE WHEN value THEN result WHEN value THEN result ELSE result) They only support the CASE WHEN boolexpr THEN result ELSE result.

At least that's all I could get to work in 8.1.6 and I just tested about a week ago. 8.1.5 didn't have any form of CASE.

Now ... the new calls can already be replaced with package_instantiate_object. I'm in favor of defining the parameter strings required by that proc right in the PG datamodelnow, if anyone wants to start down that path . The parameter string mechanism Dan implemented allows for the specifying of default values so the Tcl API to create an object is exactly the same for PG and Oracle. dotLRN (and forums and notifications) use it and it's slick.

This is a subset of what Tom's been trying to accomplish but it's very handy, though going the VIEW route is personally appealing to me.

11: Re: Eliminating 90% of DB-specific queries (response to 1)

Posted by Don Baccus on 02/24/03 09:55 PM

If there's anyone interested in this stuff who's willing to sit down and investigate 9i in detail, I think it would be really useful to have a list of new SQL99-compliant features that would let us fold together aueries.

For instance being able to share outer join queries would remove a major source of pain when porting packages between PG and Oracle.

While we may not want to drop 8i support for the core in the near future, people who want to write new packages that support both PG and Oracle are free to decide to support whichever versions they want, IMO, especially if it simplifies the porting process (of course we have to support 9i before anyone can choose to not support 8i!)

Also ... should we just move ahead and for starters define permission_p() in each so we can at least remove this source of difference? For 4.7? I'm willing to personally go through and rip out all cr_item.get_live_revision()-style function calls for 4.7.

Just these two steps give us incremental progress in the right direction, no?

Though unfortunately we're still stuck with CONNECT BY vs. tree_sortkeys and many content queries will either use those or outer joins ...