Forum OpenACS Development: Guidelines

Collapse
Posted by Ben Adida on
I've posted the initial guidelines that Don and I came up with a few days ago. You can read them at https://openacs.org/4/guidelines.adp. Please feel free to comment in this bboard about the guidelines.
Collapse
2: Response to Guidelines (response to 1)
Posted by Roberto Mello on
Horay !!

I am ready to get my hands dirty Ben. Let me know what do you want me to work on (or do you want me to choose an area?) and I'll do it.

I am updating the OpenACS 3.x documentation (just finished the Install Guide). I should have a beta version of the PL/SQL-to-PL/PgSQL-HOWTO later tonight.

I should have some better documentation for PL/PgSQL (that will be contributed to the PG team) later this week.

Collapse
3: Response to Guidelines (response to 1)
Posted by Dan Wickstrom on
Since the function manager in pg has been rewritten to correctly handle nulls, I think we should consider creating equivalents for nvl and decode. Here is a test case that I tried for nvl:

drop table test;
create table test (
       int_var integer,
       ch_var  char(1),
       fl_var  float,
       vc_var  varchar(20),
       tx_var  text
);


insert into test values (null,'t',1.0,'uno','dos');
insert into test values (1,null,2.0,'foo','bar');
insert into test values (2,'f',null,'ocho','siete');
insert into test values (3,'f',3.0,null,'foo');
insert into test values (4,'t',4.0,'baz',null);
insert into test values (5,'f',5.0,'goodbye','hello');

drop function nvl(varchar,varchar);
create function nvl(varchar,varchar) returns varchar as '
declare
        inv      alias for $1;
        def     alias for $2;
begin
        return coalesce(inv,def);
end;' language 'plpgsql';
drop function nvl(float,float);
create function nvl(float,float) returns float as '
declare
        inv      alias for $1;
        def     alias for $2;
begin
        return coalesce(inv,def);
end;' language 'plpgsql';

drop function nvl(integer,integer);
create function nvl(integer,integer) returns integer as '
declare
        inv      alias for $1;
        def     alias for $2;
begin
        return coalesce(inv,def);
end;' language 'plpgsql';

select nvl(int_var,0) as nvl_integer,
       nvl(ch_var,'NULLVAR') as nvl_char,
       nvl(fl_var,0.0) as nvl_float,
       nvl(vc_var,'NULLVAR') as nvl_varchar,        
       nvl(tx_var,'NULLVAR') as nvl_text from test order by nvl_integer;

when i run this example I get the following output:

openacs4=# i example.sql
DROP
CREATE
INSERT 66955 1
INSERT 66956 1
INSERT 66957 1
INSERT 66958 1
INSERT 66959 1
INSERT 66960 1
DROP
CREATE
DROP
CREATE
DROP
CREATE
 nvl_integer | nvl_char | nvl_float | nvl_varchar | nvl_text
-------------+----------+-----------+-------------+----------
           0 | t        |         1 | uno         | dos
           1 | NULLVAR  |         2 | foo         | bar
           2 | f        |         0 | ocho        | siete
           3 | f        |         3 | NULLVAR     | foo
           4 | t        |         4 | baz         | NULLVAR
           5 | f        |         5 | goodbye     | hello
(6 rows)
 
openacs4=#  

As you can see, it seems to work correctly. I had tried a similar test with decode before, but once I overloaded the decode for more than one type, pg couldn't resolve the type correctly without me explicitly adding a cast to the decode arguements. That now longer seems to be a problem with pg 7.1. I'll see if I can work up a test case for decode and post the results tomorrow.

Collapse
4: Response to Guidelines (response to 1)
Posted by Dan Wickstrom on
"No using sysdate in the data model. This is pretty important given that we don't really want to have the basic data-model depend on a PL/SQL procedure. In general, avoid user-defined functions in the data model. Otherwise, the sysdate() function works fine. "

I don't really have an issue with this, but why do we want to avoid using user-defined functions in the data-model? Is it just a question of efficiency?

Collapse
5: Response to Guidelines (response to 1)
Posted by Cynthia Kiser on
My ignorance of PostGres is showing. Would someone mind giving me the 3 sentence version of why no sysdate in data models? Does PostGres not have a native equivalent of Oracle's sysdate to use as a default in date columns? I infer there is a function called sysdate(); is it inefficient?
Collapse
6: Response to Guidelines (response to 1)
Posted by Roberto Mello on
Postgres has now(). sysdate() is a function we created to ease ACS 3.x porting. I am not sure now() is more efficient than sysdate() but my guess is that it is.
Collapse
7: Response to Guidelines (response to 1)
Posted by Roberto Mello on
I can see use creating an nvl clone, but I don't see how to easily do
that to decode, since it can take an arbritary number of arguments.
We'd have to create a different decode function clone for each new
argument.

One thing that I want to ask is: are we going the easy, more
maintainable route of creating functions to mimick Oracle's (e.g. nvl)
or are we porting everything to PG's dialect? Supposedly the latter
would have better performance (?) but it'll be harder to synch with
aD's tree.

Collapse
8: Response to Guidelines (response to 1)
Posted by Dan Wickstrom on
 
    Roberto> I can see use creating an nvl clone, but I don't see how
    Roberto> to easily do that to decode, since it can take an
    Roberto> arbritary number of arguments.  We'd have to create a
    Roberto> different decode function clone for each new argument.

It will be a little painful to create all of the overloaded versions of decode, but it's doable and probably less work than touching all of the .tcl files that will need to be touched for porting the decode statements to case when statements. It's also something that could be built up over time. Every time someone encounters a new decode form, instead of porting it to case when, they could just add the new form to postgresql.sql. The net result will be that as new packages are ported, the queries won't need to be touched to compensate for nvl, decode and other oracle functions.

    Roberto> One thing that I want to ask is: are we going the easy,
    Roberto> more maintainable route of creating functions to mimick
    Roberto> Oracle's (e.g. nvl) or are we porting everything to PG's
    Roberto> dialect? Supposedly the latter would have better
    Roberto> performance (?) but it'll be harder to synch with aD's
    Roberto> tree.

Based on the fact that we are bothering with a query dispatcher, I think the goal is to touch as little of the .tcl code as possible. This will make maintenance of the code easier and will minimize our porting effort. Creating our own nvl and decode statements and other oracle functions should not impact our performance greatly.

Right now when I grep in acs 4.x, I only find ~36 decode statements in the code and the majority of those are the simple four arguement case.

Collapse
9: Response to Guidelines (response to 1)
Posted by Don Baccus on
OK, a couple quick comments:
  • sysdate()'s not inefficient. The problem is that PG doesn't have the equivalent of "create or replace function", and the compiled-in reference to the function call in the datamodel means that dropping and creating a functino forces you to drop and create any table referencing it.

    If we're 100% sure a function will never change, we won't be causing update problems for our users, but if we're not ... well, I think you see the problem.

    now()'s built in and guaranteed not to change.

    BTW, Oracle has a similar problem if you drop/create functions - that's why they've got the "create or replace" non-standard query. PG will be getting something equivalent, probably in 7.2, but for now we're best off just avoiding user functions in the datamodel unless we there's no alternative available.

  • I would rather change "nvl" to "coalesce" and "decode" to "case" rather than use overloaded functions.

    They're easy to recode, and the resulting query works in both Oracle and PG so you don't need to split into separate queries for this reason alone. I've been trying to get aD to use these rather than the Oracle-isms but obviously with no success thus far. They do exactly the same thing.

    An additional reason for rewriting the query to use proper SQL92 forms is that IB tends to support proper SQL92. If we rewrite the queries to use "coalesce" and "case" we're part-way home for a port to other DBs that follow the standard.

    Since the rewrite doesn't break Oracle, and since it won't add significantly to our burden, my thinking was "why not change 'em".

    The reality here is that aD is making life unnecessarily hard by not using SQL92 constructs when Oracle supports them.

Collapse
10: Response to Guidelines (response to 1)
Posted by Bryan Quinn on
Rafi and I have been thinking about standards to apply within the ACS to make such porting efforts described here easier. I read the guidelines page and found it interesting. I'll be interested to follow any thing you guys write about porting DDL, queries, and the like. Any guidelines you want us to follow in writing Oracle SQL would be welcome. You guys know both PostgreSQL and Oracle very well, but we don't know PG too well. Any requirements you pass along to be more SQL92 compliant are appreciated.

For a number of cases, you're doing simple string transformations. It might be worth your investing in constructing a tool to do this automatically. Rafi recently showed me the power of JavaCC for creating parsers easily in Java. You can create an Oracle SQL parser fairly easily, and then write a back end for the parse tree to work properly with PostgreSQL. Actually, it looks like there is already an Oracle SQL grammar at the JavaCC grammar repository.

Once you get the tool working and all of the knowledge you guys have on porting statements encoded programmatically, you can translate all of the Oracle SQL we spit out trivially.

I'm also interested in a tool for specifying object models in a higher-level language than Oracle DDL + PL/SQL than the ACS currently uses. It sounds like you guys are going to take a different approach with PG's object-relational features which probably makes sense (i don't know enough to say). Again, if we worked out a common tool for creating objects, with us maintaining an Oracle back-end and you guys maintaining a PG back-end, we'd both save a lot of time. You could also have it configured to handle stuff like varchar(4000) => text automatically.

Another great benefit is that if you later want to change your transformation rules, you can, and just rerun the tool, rather than ask a bunch of people to grovel over the code again.

What do you think?

Collapse
11: Response to Guidelines (response to 1)
Posted by Don Baccus on
For a number of cases, you're doing simple string transformations.

So simple, in fact, that we've simply regexp'd a bunch of them in the past. We're concerned about regexp memory leaks in Tcl8, though ...

A parser could help, yes, but you need semantic information - a simple symbol table, at minimum - not just syntax information to transform outer joins. An automatic tool to do this might be worth investigating since Oracle can't be bothered to support the SQL92 syntax for this particular feature.

The reason you need semantic information is that SQL92 performs outer joins in the FROM statement. Actually, all joins can be there. Wearing my language design hat, this makes sense since a JOIN operator joins tables, not keys.

select * from foo, bar where foo.key = bar.key(+)
becomes
select * from foo left [join] bar using(key);
(join is optional, you don't really write it with brackets!)

Unqualified names require symbol table information - for tables, not just SQL statements - in order to properly correlate columns with their tables. The syntax of the SQL statement alone ain't enough.

So this looks like a long-term project for someone who's not burned out on compiler writing (I would be the person who *is* burned out on compiler writing...)

The outer joins are easy to rewrite by hand, at least. We're not faced with "UNION hell" like we were with PG 7.0 now that PG 7.1 has outer joins.

If someone were to cobble together a simple tool to massage Oracle DDL in particular, that would be a help. It could ignore anything it doesn't understand. On the other hand, those things only take a few minutes to edit by hand, too.

Our biggest task will be porting over PL/SQL to PL/pgSQL. The changes required there are beyond even a real compiler's ability to deal with - no default parameters in PL/pgSQL. Because of that, we'll use our intuitive powers to decide which variants are commonly used and, taking advantage of PG's ability to overload functions, provide a variety which default to common values in order to help the port process.

Collapse
12: Response to Guidelines (response to 1)
Posted by Don Baccus on
We should provide you help with guidelines for writing more portable SQL.  I'll be back in Boston again on March 7th, maybe by then I can get together a list of suggestions for you.  Feel free to bug me...

You folks should have a copy of Date and Darwin's "The SQL Standard".  It's on my desktop next to "SQL for Smarties"...

Collapse
13: Response to Guidelines (response to 1)
Posted by Don Baccus on
Again, if we worked out a common tool for creating objects, with us maintaining an Oracle back-end and you guys maintaining a PG back-end, we'd both save a lot of time.

We spoke a bit about the direction you're thinking of taking when I was at aD in early January. Done well, abstraction of this sort would be a great help. Not so much in porting but in maintenance, i.e. once ported these objects would often survive changes in client code without needing to be changed themselves.

And when change at this level's required, it will be isolated to a set of objects rather than scattered willy-nilly through a couple hundred thousand lines of source code.

ACS4 abstracts things to some extent through the use of a lot of PL/SQL, which then has to be ported to programmatic languages supported by other databases if it is to run somewhere else than on top of Oracle. PG has PL/pgSQL, but not all RDBMS's have an equivalent (and I think IB falls into that category).

So, yes, the approach you're talking about has the potential to be a big help. After OpenACS 4.1 is underway I'd love to take a look into providing PG support for the stuff you're working on. Again, I'll be in Boston for much of March ...

For folks reading this scratching their heads about the implications, we're not talking about ACS 4.1, which is a fait accompli, but rather the future.

Collapse
14: Response to Guidelines (response to 1)
Posted by Dan Wickstrom on
"They're easy to recode, and the resulting query works in both Oracle and PG so you don't need to split into separate queries for this reason alone. I've been trying to get aD to use these rather than the Oracle-isms but obviously with no success thus far. They do exactly the same thing."

While I agree that they are easy to port, I think it makes following the aD releases easier (less line noise in the diffs) if we don't change them from their nvl and case form. This thinking also applies to other oracle functions that we will come across. I think one of our goals in porting should be to minimize the changes we make to the .tcl files whenever possible as long, as we don't impact performance significantly.

"An additional reason for rewriting the query to use proper SQL92 forms is that IB tends to support proper SQL92. If we rewrite the queries to use "coalesce" and "case" we're part-way home for a port to other DBs that follow the standard."

It would be nice to have everyting sql92 compliant, but right now an interbase port seems kind of remote, while the likelyhood that we will need to maintain openacs4 relative to future releases of acs classic seems high. I tend to think that we should concentrate on miminizing our support hassles.

On the other hand, I think if Bryan can influence aD to use sql92 features as much then this whole discussion is moot, and we should go ahead use the sql92 constructs.

Bryan,

Do you think that future releases of acs 4.x will actually switch to using sql92 constructs whenever possible?

Collapse
15: Response to Guidelines (response to 1)
Posted by Don Baccus on
Remember that we're not intending to break Oracle support at our end.

If we recode into SQL92 when practical at our end, it would hopefully  add a little motivation to use SQL92 constructs at the aD end.  Since  our goal is to avoid breaking modules, and to separate out PG-dependent stuff, aD at least will have the option of integrating these changes.

Ben and I were thinking we (all those working on OpenACS, not just Ben and I) could help aD out by writing a code base that's more SQL92 compliant.  There's potentially more to think about than just PG/Oracle support ...

Let me point out that Bryan made the point when he met with Ben and I last month that they often write Oracle-isms simply because they weren't aware that an SQL92-compliant equivalent existed *and* was supported by Oracle.  Much as he stated above.

So I'd like to take the opportunity to encourage movement in the SQL92 direction.

Collapse
16: Response to Guidelines (response to 1)
Posted by Dan Wickstrom on
Well it's good to get a little more insight as to what you and Ben discussed when you where at aD last month.  Another area that needs discussion is the query dispatcher.  Would you or Ben care to give an overview on the query dispatcher and the design goals for it?
Collapse
17: Response to Guidelines (response to 1)
Posted by Don Baccus on
Ben's supposed to post a target in a few days.  Do you need help priming your gun? :)
Collapse
18: Response to Guidelines (response to 1)
Posted by Bryan Quinn on
I've ordered a copy of Date & Darwen. Thanks for the reccomendation!

I see a lot of reasons for us to code in a more SQL92 style at the aD end. What we'll need to is create a guidelines page for aD similar to the one Don and Ben wrote up with the different rules on how we should write SQL. I am not personally familiar with how well and how consistently Oracle supports SQL92 constructs, so some of us here will need to get more familiar and dig into the issue before I can say more. We're in the process of doing that, so I'll try to post updates.

Oracle's JDBC driver doesn't support the SQL92 outer join syntax from what I can tell. However, Oracle's docs claim it supports all other SQL92 escapes.

I need to take a closer look at some of the DDL issues. Rafi knows a lot about this. We're thinking of reducing our usage of PL/SQL, but maintaining the procedural abstractions at the business domain tier. Personally I'd like to avoid using named parameters within PL/SQL (the whole foo => :bar thing). Getting away from default parameters sounds like a good idea too.

As Don mentions, this is all discussion towards establishing standards for development going forward and does not affect ACS 4.1.

Collapse
19: Response to Guidelines (response to 1)
Posted by Dan Wickstrom on
I might of missed this in the porting guide, but when you are porting functions/procedures and you don't care about the result of a function call (e.g. ported procedure call), then you need to call the function by using "PERFORM" in place of "SELECT" or by selecting or returning into a dummy variable. Otherwise, you'll get something like:

openacs4=# i tst.sql
DROP
CREATE
psql:tst.sql:15: NOTICE:  inside tsta
psql:tst.sql:15: ERROR:  unexpected SELECT query in exec_stmt_execsql()  
Collapse
20: Response to Guidelines (response to 1)
Posted by Dan Wickstrom on
When porting functions and procdures, you need to watch out for the case where a function input variable name matches a column name used in an insert. This pattern as shown below is pretty common in the acs classic 4.x code.

create table acs_objects (
       object_id         integer primary key,
       object_type       varchar(100)
);


create function tst_insert (integer, varchar) returns integer as '
declare
        object_id       alias for $1;
        object_type     alias for $2;
begin
        insert into acs_objects 
                    (object_id, object_type) 
                     values
                    (object_id, object_type);

        return 0;
                               
end;' language 'plpgsql';

select tst_insert(1,'acs_object');

acspg=# i tst.sql
Dsql:tst.sql:7: NOTICE:  CREATE TABLE/PRIMARY KEY will create implicit index 'acs_objects_pkey' for table 'acs_objects'
CREATE
CREATE
psql:tst.sql:25: ERROR:  parser: parse error at or near "$1"
acspg=#

To fix this, you need to modify the input variable name so it doesn't match the column name used in the insert.

Collapse
21: Response to Guidelines (response to 1)
Posted by Albert Langer on
If anyone's keen on the SQL parsing/transformation idea (presumably
for future, not now) there are some useful python tools with SQL
grammar at https://web.archive.org/web/20020209211730/http://www.chordate.com/kwParsing/index.html
(included with gadfly).
Collapse
22: Response to Guidelines (response to 1)
Posted by Dan Wickstrom on
My above comment above applies to select statements as well. Here is a response that I received on the pg mailing list regarding a problem with a select failing inside a ported function:

Daniel Wickstrom writes: > -- This select is failing. It does not return the correct value > -- when object_id = 44 > select context_id, security_inherit_p > into context_id, security_inherit_p > from acs_objects > where object_id = check_path__object_id; Try distinguishing the field names from the plpgsql variable names. I believe the machine is seeing this as a command to select the current values of the plpgsql variables (ie, two NULLs) into those same variables. IIRC, unqualified names will be matched first to plpgsql variables and only second to fields of the query tables. regards, tom lane
Collapse
23: Response to Guidelines (response to 1)
Posted by Don Baccus on
I have gotten into the habit of always qualifying my column names in
queries for a bunch of reasons, this is just one more reason to do so.
Collapse
24: Response to Guidelines (response to 1)
Posted by Dan Wickstrom on
Qualifying the column names seems like a good practice.

In general pl/pgsql seems to be pretty dumb about sql, and it acts almost like the 'c' preprocessor with respect to pl/pgsql variables in sql statements.  If it finds something in an sql statement that matches a pl/pgsql varible name it just goes ahead and does a substitution of the value.  The other day I found a for loop that looked something like the following:

for t in select .... from foo t,

pl/pgsql tried to substitute the value for the record t into the table alias 't', which resulted in an error.

Collapse
25: Response to Guidelines (response to 1)
Posted by Jon Griffin on
What to do with varchar > 1000, the doc doesn't specify. I imagine they should by text?
Collapse
26: Response to Guidelines (response to 1)
Posted by Don Baccus on
Hmmm...there is a rather large gap between 1000 and 4000 in the doc, isn't there?

The spirit of the doc is that if the restriction on length exists for a good reason then keep it.  Otherwise use text.  Since text is compressed and/or flung into the equivalent of a CLOB table automatically if the content's long, I think perhaps we should keep all <4000 varchars with their restricted length.  There might be minor efficiency gains though I'm not familiar enough with the new text type implementation to know for sure.

Collapse
27: Response to Guidelines (response to 1)
Posted by Ben Adida on
In general, ACS 3.x used to have a "gap" too between
varchar(1000) and varchar(4000) in that if the need was for more
than 1000 chars, varchar(4000) was used. In general, I see little
practical need to limit a varchar to something more than 1000
characters (although I can see some application-specific needs
for certain cases).

I think it would be fine to use text for > 1000, but if there is a
reason to limit the length of the text, go ahead and limit it using
varchar, as Don mentioned.