Forum OpenACS Development: Guidelines
I am ready to get my hands dirty Ben. Let me know what do you want me to work on (or do you want me to choose an area?) and I'll do it.
I am updating the OpenACS 3.x documentation (just finished the Install Guide). I should have a beta version of the PL/SQL-to-PL/PgSQL-HOWTO later tonight.
I should have some better documentation for PL/PgSQL (that will be contributed to the PG team) later this week.
drop table test; create table test ( int_var integer, ch_var char(1), fl_var float, vc_var varchar(20), tx_var text ); insert into test values (null,'t',1.0,'uno','dos'); insert into test values (1,null,2.0,'foo','bar'); insert into test values (2,'f',null,'ocho','siete'); insert into test values (3,'f',3.0,null,'foo'); insert into test values (4,'t',4.0,'baz',null); insert into test values (5,'f',5.0,'goodbye','hello'); drop function nvl(varchar,varchar); create function nvl(varchar,varchar) returns varchar as ' declare inv alias for $1; def alias for $2; begin return coalesce(inv,def); end;' language 'plpgsql'; drop function nvl(float,float); create function nvl(float,float) returns float as ' declare inv alias for $1; def alias for $2; begin return coalesce(inv,def); end;' language 'plpgsql'; drop function nvl(integer,integer); create function nvl(integer,integer) returns integer as ' declare inv alias for $1; def alias for $2; begin return coalesce(inv,def); end;' language 'plpgsql'; select nvl(int_var,0) as nvl_integer, nvl(ch_var,'NULLVAR') as nvl_char, nvl(fl_var,0.0) as nvl_float, nvl(vc_var,'NULLVAR') as nvl_varchar, nvl(tx_var,'NULLVAR') as nvl_text from test order by nvl_integer;
when i run this example I get the following output:
openacs4=# i example.sql DROP CREATE INSERT 66955 1 INSERT 66956 1 INSERT 66957 1 INSERT 66958 1 INSERT 66959 1 INSERT 66960 1 DROP CREATE DROP CREATE DROP CREATE nvl_integer | nvl_char | nvl_float | nvl_varchar | nvl_text -------------+----------+-----------+-------------+---------- 0 | t | 1 | uno | dos 1 | NULLVAR | 2 | foo | bar 2 | f | 0 | ocho | siete 3 | f | 3 | NULLVAR | foo 4 | t | 4 | baz | NULLVAR 5 | f | 5 | goodbye | hello (6 rows) openacs4=#
As you can see, it seems to work correctly. I had tried a similar test with decode before, but once I overloaded the decode for more than one type, pg couldn't resolve the type correctly without me explicitly adding a cast to the decode arguements. That now longer seems to be a problem with pg 7.1. I'll see if I can work up a test case for decode and post the results tomorrow.
I don't really have an issue with this, but why do we want to avoid using user-defined functions in the data-model? Is it just a question of efficiency?
that to decode, since it can take an arbritary number of arguments.
We'd have to create a different decode function clone for each new
One thing that I want to ask is: are we going the easy, more
maintainable route of creating functions to mimick Oracle's (e.g. nvl)
or are we porting everything to PG's dialect? Supposedly the latter
would have better performance (?) but it'll be harder to synch with
Roberto> I can see use creating an nvl clone, but I don't see how Roberto> to easily do that to decode, since it can take an Roberto> arbritary number of arguments. We'd have to create a Roberto> different decode function clone for each new argument.
It will be a little painful to create all of the overloaded versions of decode, but it's doable and probably less work than touching all of the .tcl files that will need to be touched for porting the decode statements to case when statements. It's also something that could be built up over time. Every time someone encounters a new decode form, instead of porting it to case when, they could just add the new form to postgresql.sql. The net result will be that as new packages are ported, the queries won't need to be touched to compensate for nvl, decode and other oracle functions.
Roberto> One thing that I want to ask is: are we going the easy, Roberto> more maintainable route of creating functions to mimick Roberto> Oracle's (e.g. nvl) or are we porting everything to PG's Roberto> dialect? Supposedly the latter would have better Roberto> performance (?) but it'll be harder to synch with aD's Roberto> tree.
Based on the fact that we are bothering with a query dispatcher, I think the goal is to touch as little of the .tcl code as possible. This will make maintenance of the code easier and will minimize our porting effort. Creating our own nvl and decode statements and other oracle functions should not impact our performance greatly.
Right now when I grep in acs 4.x, I only find ~36 decode statements in the code and the majority of those are the simple four arguement case.
- sysdate()'s not inefficient. The problem is that PG doesn't have
the equivalent of "create or replace function", and the compiled-in
reference to the function call in the datamodel means that dropping
and creating a functino forces you to drop and create any table
If we're 100% sure a function will never change, we won't be causing update problems for our users, but if we're not ... well, I think you see the problem.
now()'s built in and guaranteed not to change.
BTW, Oracle has a similar problem if you drop/create functions - that's why they've got the "create or replace" non-standard query. PG will be getting something equivalent, probably in 7.2, but for now we're best off just avoiding user functions in the datamodel unless we there's no alternative available.
- I would rather change "nvl" to "coalesce" and "decode" to "case"
rather than use overloaded functions.
They're easy to recode, and the resulting query works in both Oracle and PG so you don't need to split into separate queries for this reason alone. I've been trying to get aD to use these rather than the Oracle-isms but obviously with no success thus far. They do exactly the same thing.
An additional reason for rewriting the query to use proper SQL92 forms is that IB tends to support proper SQL92. If we rewrite the queries to use "coalesce" and "case" we're part-way home for a port to other DBs that follow the standard.
Since the rewrite doesn't break Oracle, and since it won't add significantly to our burden, my thinking was "why not change 'em".
The reality here is that aD is making life unnecessarily hard by not using SQL92 constructs when Oracle supports them.
For a number of cases, you're doing simple string transformations. It might be worth your investing in constructing a tool to do this automatically. Rafi recently showed me the power of JavaCC for creating parsers easily in Java. You can create an Oracle SQL parser fairly easily, and then write a back end for the parse tree to work properly with PostgreSQL. Actually, it looks like there is already an Oracle SQL grammar at the JavaCC grammar repository.
Once you get the tool working and all of the knowledge you guys have on porting statements encoded programmatically, you can translate all of the Oracle SQL we spit out trivially.
I'm also interested in a tool for specifying object models in a higher-level language than Oracle DDL + PL/SQL than the ACS currently uses. It sounds like you guys are going to take a different approach with PG's object-relational features which probably makes sense (i don't know enough to say). Again, if we worked out a common tool for creating objects, with us maintaining an Oracle back-end and you guys maintaining a PG back-end, we'd both save a lot of time. You could also have it configured to handle stuff like varchar(4000) => text automatically.
Another great benefit is that if you later want to change your transformation rules, you can, and just rerun the tool, rather than ask a bunch of people to grovel over the code again.
What do you think?
So simple, in fact, that we've simply regexp'd a bunch of them in the past. We're concerned about regexp memory leaks in Tcl8, though ...
A parser could help, yes, but you need semantic information - a simple symbol table, at minimum - not just syntax information to transform outer joins. An automatic tool to do this might be worth investigating since Oracle can't be bothered to support the SQL92 syntax for this particular feature.
The reason you need semantic information is that SQL92 performs outer joins in the FROM statement. Actually, all joins can be there. Wearing my language design hat, this makes sense since a JOIN operator joins tables, not keys.
becomesselect * from foo, bar where foo.key = bar.key(+)
(join is optional, you don't really write it with brackets!)select * from foo left [join] bar using(key);
Unqualified names require symbol table information - for tables, not just SQL statements - in order to properly correlate columns with their tables. The syntax of the SQL statement alone ain't enough.
So this looks like a long-term project for someone who's not burned out on compiler writing (I would be the person who *is* burned out on compiler writing...)
The outer joins are easy to rewrite by hand, at least. We're not faced with "UNION hell" like we were with PG 7.0 now that PG 7.1 has outer joins.
If someone were to cobble together a simple tool to massage Oracle DDL in particular, that would be a help. It could ignore anything it doesn't understand. On the other hand, those things only take a few minutes to edit by hand, too.
Our biggest task will be porting over PL/SQL to PL/pgSQL. The changes required there are beyond even a real compiler's ability to deal with - no default parameters in PL/pgSQL. Because of that, we'll use our intuitive powers to decide which variants are commonly used and, taking advantage of PG's ability to overload functions, provide a variety which default to common values in order to help the port process.
You folks should have a copy of Date and Darwin's "The SQL Standard". It's on my desktop next to "SQL for Smarties"...
We spoke a bit about the direction you're thinking of taking when I was at aD in early January. Done well, abstraction of this sort would be a great help. Not so much in porting but in maintenance, i.e. once ported these objects would often survive changes in client code without needing to be changed themselves.
And when change at this level's required, it will be isolated to a set of objects rather than scattered willy-nilly through a couple hundred thousand lines of source code.
ACS4 abstracts things to some extent through the use of a lot of PL/SQL, which then has to be ported to programmatic languages supported by other databases if it is to run somewhere else than on top of Oracle. PG has PL/pgSQL, but not all RDBMS's have an equivalent (and I think IB falls into that category).
So, yes, the approach you're talking about has the potential to be a big help. After OpenACS 4.1 is underway I'd love to take a look into providing PG support for the stuff you're working on. Again, I'll be in Boston for much of March ...
For folks reading this scratching their heads about the implications, we're not talking about ACS 4.1, which is a fait accompli, but rather the future.
While I agree that they are easy to port, I think it makes following the aD releases easier (less line noise in the diffs) if we don't change them from their nvl and case form. This thinking also applies to other oracle functions that we will come across. I think one of our goals in porting should be to minimize the changes we make to the .tcl files whenever possible as long, as we don't impact performance significantly.
"An additional reason for rewriting the query to use proper SQL92 forms is that IB tends to support proper SQL92. If we rewrite the queries to use "coalesce" and "case" we're part-way home for a port to other DBs that follow the standard."
It would be nice to have everyting sql92 compliant, but right now an interbase port seems kind of remote, while the likelyhood that we will need to maintain openacs4 relative to future releases of acs classic seems high. I tend to think that we should concentrate on miminizing our support hassles.
On the other hand, I think if Bryan can influence aD to use sql92 features as much then this whole discussion is moot, and we should go ahead use the sql92 constructs.
Do you think that future releases of acs 4.x will actually switch to using sql92 constructs whenever possible?
If we recode into SQL92 when practical at our end, it would hopefully add a little motivation to use SQL92 constructs at the aD end. Since our goal is to avoid breaking modules, and to separate out PG-dependent stuff, aD at least will have the option of integrating these changes.
Ben and I were thinking we (all those working on OpenACS, not just Ben and I) could help aD out by writing a code base that's more SQL92 compliant. There's potentially more to think about than just PG/Oracle support ...
Let me point out that Bryan made the point when he met with Ben and I last month that they often write Oracle-isms simply because they weren't aware that an SQL92-compliant equivalent existed *and* was supported by Oracle. Much as he stated above.
So I'd like to take the opportunity to encourage movement in the SQL92 direction.
I see a lot of reasons for us to code in a more SQL92 style at the aD end. What we'll need to is create a guidelines page for aD similar to the one Don and Ben wrote up with the different rules on how we should write SQL. I am not personally familiar with how well and how consistently Oracle supports SQL92 constructs, so some of us here will need to get more familiar and dig into the issue before I can say more. We're in the process of doing that, so I'll try to post updates.
Oracle's JDBC driver doesn't support the SQL92 outer join syntax from what I can tell. However, Oracle's docs claim it supports all other SQL92 escapes.
I need to take a closer look at some of the DDL issues. Rafi knows a lot about this. We're thinking of reducing our usage of PL/SQL, but maintaining the procedural abstractions at the business domain tier. Personally I'd like to avoid using named parameters within PL/SQL (the whole
foo => :bar thing). Getting away from default parameters sounds like a good idea too.
As Don mentions, this is all discussion towards establishing standards for development going forward and does not affect ACS 4.1.
openacs4=# i tst.sql DROP CREATE psql:tst.sql:15: NOTICE: inside tsta psql:tst.sql:15: ERROR: unexpected SELECT query in exec_stmt_execsql()
create table acs_objects ( object_id integer primary key, object_type varchar(100) ); create function tst_insert (integer, varchar) returns integer as ' declare object_id alias for $1; object_type alias for $2; begin insert into acs_objects (object_id, object_type) values (object_id, object_type); return 0; end;' language 'plpgsql'; select tst_insert(1,'acs_object'); acspg=# i tst.sql Dsql:tst.sql:7: NOTICE: CREATE TABLE/PRIMARY KEY will create implicit index 'acs_objects_pkey' for table 'acs_objects' CREATE CREATE psql:tst.sql:25: ERROR: parser: parse error at or near "$1" acspg=#
To fix this, you need to modify the input variable name so it doesn't match the column name used in the insert.
for future, not now) there are some useful python tools with SQL
grammar at https://web.archive.org/web/20020209211730/http://www.chordate.com/kwParsing/index.html
(included with gadfly).
writes: > -- This select is failing. It does not return the correct value > -- when object_id = 44 > select context_id, security_inherit_p > into context_id, security_inherit_p > from acs_objects > where object_id = check_path__object_id; Try distinguishing the field names from the plpgsql variable names. I believe the machine is seeing this as a command to select the current values of the plpgsql variables (ie, two NULLs) into those same variables. IIRC, unqualified names will be matched first to plpgsql variables and only second to fields of the query tables. regards, tom lane
queries for a bunch of reasons, this is just one more reason to do so.
In general pl/pgsql seems to be pretty dumb about sql, and it acts almost like the 'c' preprocessor with respect to pl/pgsql variables in sql statements. If it finds something in an sql statement that matches a pl/pgsql varible name it just goes ahead and does a substitution of the value. The other day I found a for loop that looked something like the following:
for t in select .... from foo t,
pl/pgsql tried to substitute the value for the record t into the table alias 't', which resulted in an error.
The spirit of the doc is that if the restriction on length exists for a good reason then keep it. Otherwise use text. Since text is compressed and/or flung into the equivalent of a CLOB table automatically if the content's long, I think perhaps we should keep all <4000 varchars with their restricted length. There might be minor efficiency gains though I'm not familiar enough with the new text type implementation to know for sure.
varchar(1000) and varchar(4000) in that if the need was for more
than 1000 chars, varchar(4000) was used. In general, I see little
practical need to limit a varchar to something more than 1000
characters (although I can see some application-specific needs
for certain cases).
I think it would be fine to use text for > 1000, but if there is a
reason to limit the length of the text, go ahead and limit it using
varchar, as Don mentioned.