Forum OpenACS Development: Current short-term happenings...

Collapse
Posted by Don Baccus on
Here's a cut at our expected schedule for this week:

1. Ben expects to be able to commit his first version of the query
dispatcher
this (Monday, April 2nd) evening, along with ported queries for
most/much/all of the stuff in the acs-tcl package.

2. I'm going to hold off on commiting my first version of the multi-db
aware APM until afterwards, in part because I've made additional
changes to the boostrapper that I'll want to merge with Ben's commit.
I've got Oracle rebuilt on my laptop and will work on the APM this
afternoon and tomorrow.  If Ben gets his commit done this evening I
should be able to commit by tomorrow evening.

My first commit will probably still require Oracle to build a new
package (for either Oracle or Postgres or both) but I may be able to
squeeze out a fully-ported version by late tomorrow.  We'll see...the
APM's use of the RDBMS is pretty simple, but I've also got a bunch of
stuff to do (not the least of which being my taxes).

3. At that point we'll be ready to organize folks to do porting.  I
plan to work on this starting in a day or two.  Once Ben and I get our
commits made I'd encourage people to download and try things out.

  Administrative stuff to do includes deciding who works on what (and
I'm still interested in comments on the fairest/most efficient way to
do this with a group of folks who mostly have never met each other or
worked together before), setting up the CVS tree for various modules,
and deciding on a short-term methodology for collecting and
distributing status reports.

Should we be asking individual porters to work on an estimated
schedule for various modules or should we just run freestyle for a bit
with published status reports to keep folks informed?

Collapse
Posted by Malte Sussdorff on
Honestly, I think the free style for a start is a good approach. See what happens. But insist on bi weekly status reports (so other people might actually help out if things get sluggish for personal reasons).
Collapse
Posted by Michael Feldstein on
I would suggest something halfway between a formal schedule and freestyle. Perhaps each porter (or team of porters) could submit their proposed plan for the next two weeks in each bi-weekly report. That way people are encouraged to set goals but they're in small enough chunks that they can be highly flexible.
Collapse
Posted by Don Baccus on
In reality I think it will be impossible for people to set even a rough schedule for the larger modules, at least people who are new to ACS 4.x.

So freestyle will dominate by necessity.  What I think I'll do is ask folks working on larger modules to first do an assessment so we can get a handle on difficulties we might face.

For instance, a module that depends on Oracle's built-in java-based e-mail utils (are there  any, or was that just ACES?) will be harder to port than something like News that is mostly a UI and presentation wrapper around the  Content Repository.

The first thing we'll want will be a list of more difficult problems which need a non-Oracle solution, because there might well be overlaps between modules and we won't want to spend time solving problems twice.  And we'll want to float ideas for solutions for comment by the  community.

Collapse
Posted by Kapil Thangavelu on
in addition i making another release of the acs-sql-extractor later
this week (probably wednesday) this is the first release ready for
mass usage/testing. its been tested on python 1.5.2 /2.0 on linux
and windows:) with the aD acs4 packages from cvs and on the openacs
cvs. it seems to perform perfectly afaics. at this point i'm just
cleaning up the code/documenting and i've started working on
transformation of the original tcl source to use the query
dispatcher calls. i'm not sure on the exact syntax of what this
transformation should be, right now i'm working with the assumption
that something like

db_0or1row foo_select "select object_id from site_nodes where
node_id=:node_id"

gets transformed into
db_0or1row foo_select [db_fullquery_get_text [db_fullquery_fetch
"xxx.yyy.foo_select"]]

does this look right?

feedback on the extractor is welcome.

for those who don't know what the extractor is... its a software
thingy that eases the developer burden of conforming to requirements
of the query dispatcher by extracting the queries from .tcl files
and converting them to the xml format files that the query
dispatcher expects. it will also transform your original db calls
into the appropiate query dispatcher calls.

a couple of other ideas on various utilities to toss into the mix
have occured to me.

  - package .info metadata compiled into sets of html pages with
packages, sorted by num of files, version, dependencies, etc. should
be useful initially for people to pick a port to work on.

  - the transformer can just as easily replace query dispatch calls
with a db specific query, such that you can by pass the dispatcher
if you need the extra speed. this should be roundtrip so you can
make the package available again in a db-neutral manner.

  - i was looking at the code for the query dispatcher in cvs, and
realized its using ns_xml which is based on the dated and somewhat
buggy libxml (hence libxml2), not to mention that ns_xml is probably
not going to be maintained. it should be trivial to convert the
query xml files into a format the query dispatcher could convert
into tcl data structs with some simple regexs.

  - generate html of the queries on a per package basis, to be
included with
the package html. allows a developer to quickly view the sql in a
web browser.... perhaps of dubious utility...

i'm trying to make this system into a suite of openacs developer
tools so if people have other ideas, i'm all ears... umm.. eyes:).

i'm planning on making releases of this package on at least a
biweekly  basis for probably the next two months. release early,
release often... the internal interfaces aren't completely stable in
this release, but they will be soon once i finish the xml parsing
jaunx.

Collapse
Posted by Kapil Thangavelu on
forgot to mention i'm going to try and port the system to acs-java
as well... (minus the transformations)
Collapse
Posted by Don Baccus on
I forwarded part of this to Ben so Kapil can hopefully get an answer to his dispatcher question ASAP.

As far as the extractor and other tools go ...

Some of the package summary stuff Kapil discusses is handled by the APM, though awkwardly (i.e. you have
to visit the package to see the summary information).

It would be nice to have more summary information for installed modules as a whole within the APM itself, rather than as a standalone tool.

In addition, one thing I've been thinking would be useful would be a dependency tool
that orders the modules according to where they slot into the
dependency hierarchy.

This ordered list would then tell us what order we need to port
modules in.

The system's simple enough so that we can figure this out by hand
quickly enough but writing the queries to generate the list
automatically would be easy...any takers?

In regard to nsxml versions this vs that, at the moment we're using THREE XML parsers in OpenACS - the
Tcl-based one used by the APM, the Oracle Java-based XML parser
used by CMS, and now nsxml used by the query dispatcher.

By release time that needs to be cut down to *one* XML parser.

Ben wrote nsxml (though not libxml of course) and I imagine he knows
that if we're going to use it, he's going to maintain it :)

I don't really care which parser we use but personally find the
notion of employing three of them unacceptable.

Collapse
Posted by Kapil Thangavelu on
i also emailed ben privately on this one, because i felt it was
urgent, and i intended to post the reply back to the bboard.

i think it would be nice to have the package summaries in the apm,
but for me it was more a matter of extra xml lying around and i just
went through the trouble of writing a generic sax stack object
handler and had this handy dandy framework that was in want of
use:), in other words i climbed the mountain because it was there
and looked like a molehill:)

re parsers. moving to one sounds ideal, i guess we're not using the
oracle java parser:). umm... looking through the source of nsxml it
looks like it was written by curis galloway of aD march 2000...
unless i'm missing something it looks like its going to the way of
all aD tcl things, which considering it hasn't been touched in the
last 6 months isn't a big change. someone want to take up the torch?

as far as the bugginess of libxml goes, i mistook, i was under the
impression that ns_xml was working against libxml1 but looking over
the source i see its for libxml2, which is a pretty nice system. fyi
about libxml1, i'll let the authors speak for themselves, go to
libxml.org and notice the warnings in red on both the front page and
faq that state Do Not Use libxml1.

Collapse
Posted by Don Baccus on
I was mistaken in regard to Ben's having lashed together nsxml.  As I mentioned in e-mail the only real maintenance issue is probably eventually migrating to Tcl Objects once OpenNSD dumps Tcl 7.6.

Ben's offered to move the XML APM install parse to nsxml once things settle down (only one routine and the parse is simple).

So that will just leave the Oracle Java bit, which I'd like to replace  for both Oracle and PG in order to avoid duplicating sources.

Collapse
Posted by Krzysztof Kowalczyk on
Regarding which XML parser to use: I would suggest sticking to nsxml. Reasons:
  • it's based on libxml that seems to be actively maintained and quickly evolving (they have some support for XSLT already)
  • I believe it already has enough functionality for OpenACS needs and improving it would be rather easy
  • it's true that aD is probably going to stop developement but the source code is there and increased usage (in OpenACS) will increase the chances that someone will pick it up. This would benefit not only OpenACS but the whole AOLserver community since so far there is no fast and comprehensive support for XML in it.
Collapse
Posted by Don Baccus on
Yes, I made the same observation about nsxml's usefulness to the opennsd community at large to Kapil and Ben in e-mail.  We're going to  use nsxml and Ben will work on chaning the APM over to it.
Collapse
Posted by Kapil Thangavelu on
from ben on changing tcl code w/ query dispatcher calls

No, we definitely don't want to change the Tcl code this way. In
fact,for now, I would say don't change the Tcl code at all, in case
aD releases more Tcl code.

The Query Dispatcher includes hooks from the db_ api which will
automatically handle the dispatching. Right now, they still expect
default SQL. Eventually, we can take that default SQL out. But, as I
mentioned, I'd rather keep the Tcl code clean for now.

end from ben.

this looks like a much more elegant solution than what i thought the
query dispatcher was doing, very nice.

Collapse
Posted by Don Baccus on
Glad you think so...

Now - where's that "query puller"? :)  Seriously, we're just about where it would be *very* useful.

I would've committed changes tonight but I ran into a (I should've known from first base) arsDigita bug that I presumed was a result of my modifications to the APM etc (after all, I do sometimes break things).

I think we need to set a Golden Rule of Porting aD software - if it breaks, assume first that aD broke it.

Harsh, but fits my experience.  I always assume it is me (or Ben or Dan or Roberto or whoever) first but most of the really dumb ones stem  from the aD source.

Sorry, but I'm not in a very charitable mode this evening...I really wanted to commit my early changes tonight but my install-from-scratch efforts were getting errors, which it turns out I can replicate in 4.2beta by dropping and adding files via the APM (I'd dropped and added file to move them from "sql/" to "sql/oracle")

Collapse
Posted by Stephen . on
I tried 4.2 beta recently and found the same problem. It wouldn't recognise new tcl files as type 'tcl_procs' for example. This is a bug introduced since 4.1.1. OpenACS is based on 4.2 beta right?
Collapse
Posted by Don Baccus on
Yes, it is based on ACS 4.2.  The problem I ran into was that someone split acs-mail-create.sql into three pieces, each named "*-create.sql".  As distributed, two were tagged as being "data_model" and only acs-mail-create.sql was tagged as being "data_model_create".

Therefore only acs-mail-create.sql was loaded by the APM at install time.

However, we're moving datamodel files into sql/oracle and sql/postgresql subdirectories.  I moved acs-mail/sql/*.sql into sql/oracle, told the APM to remove the old files and add the new files.  Then trashed my test Oracle user and re-installed.  Then got errors as acs-mail-packages-create.sql was being run by the installer,  as well as acs-mail-create.sql.  The package create failed of course as the tables weren't defined.

  acs-mail-create.sql also ran acs-mail-package-create.sql so the end result was a proper install, but I figured I must've screwed up the APM somehow causing it to misidentify data_model_create files.

Not at all ... the screw-up was there in 4.2beta.

The fix is easy, I'll just rename the files...

Collapse
Posted by Andrew Piskorski on
I tried 4.2 beta recently and found the same problem. It wouldn't recognise new tcl files as type 'tcl_procs' for example. This is a bug introduced since 4.1.1. OpenACS is based on 4.2 beta right?

Stephen, ACS 4.2beta was the first 4.x that I used much, so I didn't even realize that was a bug, but FYI, I avoided it just by naming all my Tcl files "*-procs.tcl", as that seems to be the convention in the ACS packages I looked at.

Collapse
Posted by Don Baccus on
Oh, I understand what Steve's talking about, now, I didn't quite get the gist of his message.

The APM recognizes two kinds of .tcl files in your tcl library.  *-proc.tcl are tcl library files, *-init.tcl are tcl initialization files.  The notion is that all tcl library files are sourced, resulting in your package API being totally defined, then the tcl init  files are sourced.  Doing things in this order makes it possible for the *-init files to use the full package API defined in the *-proc.tcl files.

Re Don's question about Oracle Java.

There's now a python stored procedure language for postgresql
which can import arbitrary cPython modules (currently list
of modules statically compiled but will be fixed). Should be
able to provide postgresql equivalent for anything aD does in java for Oracle version - eg the unit tests include a hash function similar to current needs for commented out routines in kernel.

Crypto hash functions running in C from python standard modules should be more efficient than java as well as easier to program. Might also speed up the trees stuff.

Plenty of stuff for mail etc in python modules - it's got a
comprensive collection of add-on modules like perl (still
smaller but much easier to use).

http://users.ids.net/~bosma/

BTW It uses the SPI backend same way as native plpgsql with
prepared queries so can be just as efficient as native query
planning. (Prepare the query plans when first establishing
a connection, then reference them from the global dictionary
they are stored in to use them later. The global dictionary
is per connection/backend and there is also another per
function - neither know anything about statements or
transactions).

Could also be interesting way to make use of the
Query Dispatcher in Postgresql specific version.

Instead of parsing each query each time it is called (no
big deal) and preparing an expensive postgresql query plan
each time (sometimes a big deal), might be possible to get the
highest possible performance with pre-planned queries using
auto-generated functions corresponding to the same apis but
work with the query plans dictionary rather than directly.

Also perhaps relevant to problem I see coming when aD migrates
logic from PL/SQL to web platform java as they move from
ACS 4.x java to ACS 5.

Postgresql version could just keep the logic inside
the database where it belongs - especially with an ORDBMS
like postgresql cf a primitive RDBMS like Oracle 😉

plpython released 31 March, still being polished:

http://users.ids.net/~bosma/

Author: Andrew Bosma [mailto:andrew@corvus.biomed.brown.edu]

Note this is entirely independent of porting to python
web platforms. Functions created are just like any
other postgresql functions used the same way from
any adaptor for any web platform.

Ask author for any further questions after reading the docs
and code - I'm unlikely to know the answers.

Collapse
Posted by Don Baccus on
The new plpython work is very cool, no doubt.  My major objection to using it to work around the embedded Oracle Java stuff is exactly the same as my objection to the embedded Oracle Java stuff in the first place - it is horribly unportable.

The problem is our desire to support other RDBMS's in the future combined with the fact that some don't feature any decent embedded language facility with all the needed hooks to the system environment.

So my feeling is that if we were looking at OpenACS as solely being an Oracle + PostgreSQL solution, the path you suggest would be reasonable.  But given our interest in other RDBMS platforms - including even MySQL if the new InnoBase backend matures and performs as advertised - I'd prefer to avoid trading one unportable solution for another.

As far as storing queries in functions to avoid queryplanning overhead, Karel Zak wrote an add-on that allows you to cache queries.  I've had my eyes on that.  It wasn't released with PG 7.1 but I think  it might be released with PG 7.2.  Then the query processor could tell PG to save the query plan and execute it.

We are looking at creating functions as you suggest for the inline stuff called from db_exec_plsql, in fact Dan's committed a simple version that creates and drops a function wrapper on each call.  Later we'll want to keep track of these anonymous functions so they don't need to be recreated for each call - Dan's hack is meant only as  something to keep this from being on the critical path.

Collapse
Posted by Albert Langer on
Don,

Thanks for the detailed response. Don't want to get into an argument
now as:

a) I'm basically just a lurker passing on info and not planning to
contribute to the urgent work going on at the moment.

b) More urgent at the moment to just keep it rolling as is, rather
than discuss things that aren't on critical path.

But here's some thoughts to keep in mind for later.

I agree that any efficiency improvements in query planning should
just be kept off critical path. Presumably db_exec_plsql achieves
that. But also presumably anything to do with query planning is
just as postgresql specific whether done one way or another.

Ensuring "portability" is different from actual "porting".
"Portability" just means not creating unnecessary
*obstacles* to later porting. Doesn't necessarily mean
doing much extra work up front to *further* reduce effort
of porting later by a more "generic" solution - that is
often illusory anyway.

Re portability generally, I suggest this be reviewed when the
current pressure is off and situation with ACS4 java and ACS5
more clear (and also concrete plans for other DBMS ports more clear).

The way I see it there's going to be a major problem keeping up
with aD's migration from Tcl to java.

That ACS roadmap from aD has a lot of food for thought:

http://www.arsdigita.com/wp/display/26811/26812.wimpy

My understanding is that the data model will shift in successive
ACS 4.x java releases as they move to both less RDBMS dependence
and *more* java dependence.

That may mean there should be less embedded Oracle java.

But it could also mean it's going to be a lot harder to
keep up with new modules from a Tcl platform that's been
abandoned by aD when they are doing more and more stuff
on the web application platform in java.

Up to now OpenACS has been focussed on PostgreSQL
(and AOLserver/Tcl).

It obviously makes sense to support Oracle as well for
the existing base of ACS classic users that are being
abandoned, and at the same time make it *easier* for
people who want to do ports for firebird/interbase, sap db
etc.

At present the work is still on doing a PostgreSQL
port in a way that avoids putting obstacles in the way
of other ports, unlike the previous situation where you
faced a severe obstacle now resolved by db-api.

Any other ports will still have to do a lot of work,
and if current port hasn't done work they will need to do to
cope with problems in the stored procedure languages
available to those RDBMSes, that just means they will
have to do that work for their RDBMS/web platform
when it's easier with PostgreSQL, not that current work
is creating obstacles or should avoid using postgresql
specific solutions to get what it's now doing done easier.
It should just avoid creating obstacles by specificying
appropriate interfaces.

eg You've got java crypto hash functions for Oracle and
can get them easily for PostgreSQL. Whoever needs them
for some other RDBMS will also need to solve the problem, so
any (layered) API adjustments necessary to avoid creating an
obstacle for them should be done, in case it turns out *they*
have to do it within the web platform instead of more easily
within the database as with Oracle and PostgreSQL. But a PostgreSQL
version including such necessary kernel procedures should
not be held up now just in order to save time for somebody
else doing something else later.

There's good reasons for having chosen PostgreSQL and
even when others are available they will still be good
reasons. One of them is that it can keep up with or
surpass Oracle.

Likewise AOLserver/Tcl is clearly the only relevant
platform for OpenACS right now. But others are likely
to become of interest once that's been done for OpenACS 4,
even if not done by same people working on current port.

Keeping logic in the database may enhance web application
platform portability more than it reduces RDBMS portability.
That's especially relevant with issues re AOLserver and likelihood
of people wanting to work with java based modules released by aD.
A lot of OpenACS users are likely to be disk bound due to not
having bucket loads of spindles like a large site, so the
performance of AOLserver may not be that relevant to them.

Proof that the ACS data model is what it's about, not the specific
code, has been demonstrated on the one hand by the porting to a
different dialect here, and on the other hand by shift to an entirely
different web platform at aD.

OpenACS could well be an umbrella for people
wanting more "Open" development of ACS with respect to web
platforms as well as SQL dialects. (Whether willingly or not
eg I noticed an item about someone being required to use PHP).

It isn't called OpenTclACS even though that's the current reality.

Doing application logic intended for java in Tcl could turn
out to be both more difficult and a bigger obstacle to acting
as an umbrella than doing some of it embedded in stored procedures
(as at present).

Anyway these are clearly issues for the core team and other
developers to resolve later, rather than for me to argue
about now, so I'm just passing on the thought.

Press on regardless 😉

Collapse
Posted by Don Baccus on
Well, to some degree I'm going to have to disagree with your comment regarding putting in extra up-front work to make porting easier. Our job would've been far easier if aD had taken portability as a goal.

This is, for instance, the major reason why they're pulling application logic out of the RDBMS and into Java. In most cases the effort involved isn't much different for the initial implementation, but it makes a huge difference in the level of difficulty involved in porting to a new RDBMS.

I have a fair amount of experience with portability issues, as my old software company (now defunct but in business for 13 years) was based on optimizing compiler technology designed and mostly written by me, supporting four languages and about a half-dozen minicomputer and microprocessor families.

I should clear up one misunderstanding - the OpenACS project, at the moment, isn't structured to keep up with the aD migration from Tcl to Java. For one thing, aD isn't really planning a migration path and recommends that those who want to dive into their Java future do so ASAP. It's more of a break than a migration, in other words, at least for the code (as opposed to all those migratory users!)

What we call OpenACS 4.x, then, is now and forever will be Tcl based. I say this with assurance because aD isn't going to cut right to ACS 5.x. In other words, there will soon be no confusion between ACS 4.x vs. ACS 5.x once aD follows through on the published plan. ACS 4.x == Tcl == the past. ACS 5.x == Java == a bright shiny future.

If the Tcl version, i.e. OpenACS 4.x, continues to evolve it will be due to the efforts of the community here, not aD.

Now - will this community drift off and adopt Java wholesale, or will enough folks be satisfied with OpenACS (Tcl) 4.x to grow it and continue to add to its functionality? I don't know the answer, frankly. My guess is "Yes, this community will continue to grow OpenACS 4.x and not just run to ACS 5.x". But I could be wrong.

It's not something I'm thinking a lot about, frankly. ACS 5.x isn't going to be available for some time. aD is interested in multi-db support but has no resources available to work on actually supporting anything other than Oracle with ACS 5.x for the rest of the year (as announced in the conference call). We can't port it ourselves because it doesn't exist.

So ... not being able to solve problems that are totally out of my control, I'm perfectly willing for the moment to ignore their very existence. We'll see what happens with ACS 5.x and we'll see what happens with aD and their efforts to bolster their image in their user community and we'll see what happens with the folks here.

Actually, when you say things like this:

eg You've got java crypto hash functions for Oracle and can get them easily for PostgreSQL. Whoever needs them for some other RDBMS will also need to solve the problem, so any (layered) API adjustments necessary to avoid creating an obstacle for them should be done, in case it turns out *they* have to do it within the web platform instead of more easily within the database as with Oracle and PostgreSQL. But a PostgreSQL version including such necessary kernel procedures should not be held up now just in order to save time for somebody else doing something else later.
I get the impression we're thinking much alike. The specifics regarding the use of Oracle SMTP utilities fall into an area where there's already perfectly reasonable support within OpenNSD, so dragging that stuff out of the db and into the Tcl application layer is no big deal and solves the problem for any db (by removal).

I'm sure there will be instances where we won't seek an immediate all-encompassing solution, though. In particular, the very existance of PL/SQL and PL/pgSQL code is essentially dodging the bullet. If the OpenACS 4.x community thrives and doesn't drift off to Java-land, and if there are those in the community who want to support (say) InterBase, well...there will be a lot of work to do due to all that programmatic code buried in Oracle/PG.

At that point, it might make sense to start pulling logic out of the database and ending the dependence on PL/SQL+PL/pgSQL but that is a bridge that will be crossed later. If ever.

From reading your post I get the impression you think that it might be possible to use Java modules with the ACS Tcl 4.x core. If I read you correctly, I'll have to point out that as far as I know aD has absolutely no intention of supporting hybrid sites. I don't think that scenario is a viable one, frankly. You might be able to migrate to Java by rewriting your entire site (i.e. your customizations to ACS 4.x) and the similarities in datamodels may make it possible to migrate your accumulated data, but to do so will mean switching from ACS Tcl to ACS Java entirely.

I think the bottom line is that the Java future will represent a clean break for those who take that path. No code blending. Datamodel similarities will lower the bar for those who want to switch, that's about it. I don't think any decision we make regarding how much logic to leave in the db vs. the Tcl application layer will be relevant in practice.