Forum OpenACS Development: Invalid XML in XQL files

Collapse
Posted by John Sequeira on
I am bolting query-dispatcher awareness onto Michael Cleverly's nstcl library,  and I'm having trouble parsing the xql files.  I was wondering if anyone knows why the xql files don't use CDATA wrappers around the querytext sections?

In other words,  instead of this

<?xml version="1.0"?>
...
<fullquery name="db_api_acceptance_test_select_asdf_sysdate_from_food_2">
      <querytext>
      select asdf, sysdate as datestr from footest where asdf > :asdf
      </querytext>
</fullquery>

which is not valid XML (and thus chokes tDOM),  why not use this:

<?xml version="1.0"?>
...
<fullquery name="db_api_acceptance_test_select_asdf_sysdate_from_food_2">
      <querytext><![CDATA[
      select asdf, sysdate as datestr from footest where asdf > :asdf
        ]]></querytext>
</fullquery>

I'm doing a regsub find-replace to make them into parser-friendly xql,  but I'm wondering if fixing them at the source is an option.  Non-standard xml seems to defeat the purpose  😊

Collapse
2: Test script (response to 1)
Posted by John Sequeira on
FWIW here's a test script

[... set root to root folder of openacs]
package require tdom
package require fileutil

set files [::fileutil::findByPattern  $root *.xql]
foreach file $files {
  set doc [dom parse [fileutil::cat $file]]
}

If this doesn't croak,  then the xql files are fine.

Collapse
3: Re: Test script (response to 2)
Posted by Jeff Davis on

I asked Don about this before when I wrote some validation utilities for the xql files. I think his view was that it was a little annoying to write correct xml (manifestly true) and that turning all the extant files into valid xml was not a high priority.

I did validate all the files in 4.6 at one point so a part from the quoting problem they should now all be clean (and QD does an ns_quotehtml on the query text before it feeds it to ns_xml so it is actually parsed with a standard xml parser rather than something hand rolled).

I think it's easy enough to clean up with a simple script, maybe it's time to do it. I would also like to do some stuff to cache the parse across restarts since I would like the server restart to be quicker and as it stands the qd parse is an appreciable fraction of the startup time.

Collapse
Posted by Don Baccus on
Someone should probably ping Ben on this since he wrote the QD, but I think Jeff's summary of my position's more or less correct.  Another way to put it: the Query dispatcher transforms .xql files into valid XML before parsing them and my presumption has been that Ben did this by design, to make the query files less obnoxious to write.

But I've never asked him ...

Caching for quicker startup would be nice, of course, but that's another issue.

Yet another issue that we need to tackle soon is to drop libxml and switch to Zoran's tDom AOLserver module.  It was supposed to be "released any day now" late last summer so one would hope it's actually available now?  Does anyone know?  ns_xml only exposes a trivial subset of libxml and since everyone seems to believe that Zoran's stuff's better than libxml and since supposedly he's exposed everything in his AOLserver module, it seems much more reasonable to switch rather than extend ns_xml into something more useful.

Collapse
Posted by Bart Teeuwisse on
Yet another issue that we need to tackle soon is to drop libxml and switch to Zoran's tDom AOLserver module.  It was supposed to be "released any day now" late last summer so one would hope it's actually available now? 

Don, tDOM 0.7.5 was released 2002-11-27 and is available from http://www.tdom.org/. It works great with AOLServer!

Let's switch over to tDOM. And now that we are on the subject of XQL files, could we please include an XQL DOCTYPE at the top of each XQL file? I can provide a DTD for the type. With a DOCTYPE in place XQL files are a snap to create/modify by hand when using Emacs.

/Bart

Collapse
Posted by C. R. Oldham on
Sorry if this has already been discussed.  Is tDOM a drop-in replacement for nsxml?
Collapse
Posted by John Sequeira on
C.R.,

I've written a tiny compatibility layer mapping ns_xml commands to tdom commands as part of my portable.nsd effort.  It's not quite drop-in,  but close.

Collapse
Posted by Jeff Davis on
As much as I hate having queries in a seperate file, I am not sure if pre-processing the tcl is such a good idea. If the first step in installing openacs is "compile the actual tcl scripts" I would have serious concerns that the we would see two things
  • Mysteriously broken code which would result from errors in the parser or quirks in the particular source tcl.
  • even more difficulty in merging improvements from particular sites back into the code base.
If it were a cleaner language or if we had a full parser I would not worry quite so much but regardless of how good the parser is, the later problem will remain.

I don't see a lot of evidence that the QD overhead is anything to be that concerned about relative to going to the DB in the first place.

I think there is probably a better argument for precompiling adp files rather than doing them on the fly, although there it is much clearer that what you are doing is compiling and no one would really be tempted to discard the adp's and work directly with the resulting code.

Collapse
Posted by Tom Jackson on

Could you create a new mega-file with all the queries, or maybe just a tcl file that runs a series of commands, avoiding parsing completely? Seems like the tcl file could check mtimes first and only parse new/changed stuff.

Collapse
Posted by John Sequeira on
I would like to propose an alternative to the megafile.  I know there's a python script somewhere that takes SQL out of the tcl files and into the XQL files.  I've prototyped a script that does the reverse - takes queries out of the XQL files and inlines them into the TCL script.  It then deletes/renames the XQL files so that they're not found and parsed on start-up - the query-dispatcher commands effectively become nop's.

The advantage of using this is that you don't take the XQL parsing hit on server start-up (which is substantial),  and you don't have to run through the query dispatcher code on each query or chew up more memory caching queries that are stored in bytecode-compiled tcl.

I initially did this so that I could get the openacs source code to play nice with nstcl,  but it seems like it could be useful to others.  I can definitely see why you'd want to have XQL files - they came in handy when I worked on porting the OpenACS kernel to MSSQL,  for instance.  However,  for people developing apps in OpenACS (as opposed to people developing Openacs) it seems as though the XQL abstraction just slows things down.

Using the script wouldn't require re-architecting OpenACS or even any kind of direct support,  but I just wanted to broach the idea before time was spent on upgrading the qd.

Collapse
Posted by Roberto Mello on
John,

Where can on take a look at that script? Also, how does it account for the use of db_map?

Thanks,

-Roberto

Collapse
Posted by John Sequeira on
Roberto,

It doesn't currently handle db_map.  I think there are only about 10 uses of it in the kernel,  so I hadn't gotten there yet.

It seems as though replacing db_map calls with { db_map_sans_qd  "string-from-xql-file" } wouldn't be very difficult...

I'll email you the script.

Collapse
Posted by Bart Teeuwisse on
John,

there are a number of packages outside the core that use db_map quite a bit. The ecommerce package for one.

While the reverse script might be a blessing if you develop for one database I feel that the idea goes against the purpose of keeping OpenACS DB neutral.

By re-inserting the SQL queries in the TCL files, developers are encouraged to change/fix queries in the TCL files rather than making changes to the XQL files.

Of course it would be possible create new XQL files by re-extracting the SQL queries but my milage with the extraction script has varied.

I'm intrigued by your proposal but think that it could create a lot of confusion when it comes to developing and maintaining OpenACS packages.

/Bart

Collapse
Posted by John Sequeira on
Everyone's made some good points as to why openacs developers wouldn't want to inline queries.  And Jeff's point about the parser being difficult to implement is well-taken.

For my current effort,  OTOH,  minimizing startup time is of huge benefit.  I spent 15 minutes and implemented OACS_FULLQUERIES as a persistent array (see http://mini.net/tcl/3469 ),  and just made db_qd_load_query_file a no-op.  This benefited startup time enormously,  as you'd expect,  and my work now proceeds faster.  OpenACS' implementation of OACS_FULLQUERIES is different than my/nstcl's own,  but using a trace in this way would probably achieve the desired results.

Anyway,  although I can see why it's not appropriate for the toolkit,  I have only worked on single-db projects to date and I still like the idea of inlining.  I might revisit it someday by porting my ugly perl script to a tcl one that uses ParseTools ( http://www.oche.de/~akupries/soft/ptools/ ).  If it worked as advertised,  it might actually do the trick.

Collapse
Posted by Dave Bauer on
The Query Dispatcher aliases the ns_xml commands with it's own xml processing procs.

So just changing those few procs to use the tDOM API should be easy.

tDOM uses the Document Object Model which is a standard API for accessing XML documents.

Collapse
Posted by Don Baccus on
People who are developing single-DB applications in OpenACS don't need to use query files at all for their custom code.

People who are developing multi-DB applications will want use query files.

The QD overhead, as Jeff suggests, appears to be very low when compared to the expense of executing a query.  There's really nothing to be gained here performance-wise.

It does require some memory ... on OpenACS.org about 3MB IIRC (we looked at this when trying to figure out why our site was eating up so much memory).  On a modern machine that's not a large number.  And if we follow through and pull queries out of the Tcl files belonging to standard OpenACS project supported packages of course the compiled Tcl bytecode will shrink by a roughly comparable amount so the overall overhead is very low.

The single issue which I think all would agree is important is the start-up issue, as start-up time can be fairly slow on sites that use a large number of packages.  It's not just query files, though - we're compiling every */tcl/*-[init/procs]*.tcl file on startup, too.  It's not clear how much can be done about this.

I really don't see much value in a tool that folds .xql files back into .tcl files, to be honest.  It's not something I'd ever use.