Forum OpenACS Development: Response to Query Dispatcher initial specs posted

Collapse
Posted by Ben Adida on
Thanks for these useful comments!

I think I should clear up something that Don and I have assumed but which we haven't made completely clear yet. Today, PostgreSQL is the only open-source RDBMS we would want to use OpenACS with (because we need transactions and real stored procedures). Thus, I understand that some might question the reason for multi-RDBMS capability. However, it's relatively clear that ArsDigita's focus is now on Java, and not so much on Tcl. This leaves OpenACS as the "keeper of the Tcl faith." And it means we really want to keep OpenACS compatible with Oracle. And potentially compatible with DB2 and other closed-source RDBMSs. Plus, who knows, one day MySQL might be a real RDBMS, and then we wouldn't want to have an architecture that is single-RDBMS only, making us look a lot like the ArsDigita we criticized a few months ago. Thus, the real push for multi-RDBMS.

Now, to answer a few issues.

  • three files for each web page: I think the move from 2 to 3 is actually not a big deal, as long as things are clearly laid out in the file hierarchy. Also, if we're going to have arbitrary compatibility with various RDBMS systems, I'm really not sure how we can preserve the simple development environment without going to 3 files. Having a central SQL file or a thematic system makes it certainly no simpler to understand or program, does it? You still have to have 3 buffers open. And I can't think of an easier way of locating a query than by finding a file with the same name in a parallel directory.

  • steeper learning curve: yes. OpenACS 4.x will have a slightly steeper learning curve than 3.x. Building more maintainable, more structured code can only happen if you teach your developers a structure by which they can abide.

  • caching of query results: This is a serious philosophy difference with the way ACS/OpenACS has functioned in the past. The master data is always in the database. Cached queries sound like a great idea, but they result in horrendous cache management, dirty reads, etc... I think if someone wants that functionality (and sometimes that is warranted!), they should explicitly use util_memoize. There is danger in easily allowing caching where that is inappropriate, and I would strongly vote against that idea.

  • memory usage: I doubt that there is more than 1 meg of SQL queries in a running ACS installation. However, assuming this gets very large, we can always use some level of cache flushing. I doubt this will be necessary, but certainly there are plenty of cache size-limiting methods that we can implement here without ever changing the specs.

  • function call overheahd: Compared with the overhead of performing a DB query, function call overhead seems minimal. In high-end architectures, where the DB is going to be on a separate machine with network overhead, this seems negligible to me. At the end of the day, I'm not quite sure how to make this more efficient without sacrificing significant functionality.

  • thread contention for shared memory: Inherently, it seems the thread contention on DB pool data structures is much worse than the contention on a bucket of nsv vars (at least for the first query in a page). I can't guarantee that this won't become a problem on pages with lots of serial queries (where the DB pool contention is no longer an issue, but the nsv vars are). If someone wants to add some insight here, that would be great. I do think that the AOL guys optimized nsv's pretty well, though.

  • SQL92 queries: there is no explicit tagging of SQL92 queries. Sorry if that was unclear. It's just that we will try to make the default query (the one that is tagged with neither RDBMS type nor version) as SQL92 compliant as can be. Thus, when an RDBMS isn't SQL92 compliant for that query's constructs, we will have to implement a query specific to that RDBMS, and tag it appropriately. Thus, the more SQL92-compliant an RDBMS, the easier it will be to port OpenACS to it.

  • Tcl-proc version of dispatching: I thought about this a while ago, having some smart Tcl approach to dispatching queries. The one presented by Pascal with a proc_doc approach is cleaner than anything I'd thought of, and looks fairly cool. The problem is that this is Tcl-dependent. Why is that a problem? Because, even though I don't have an immediate interest in OpenACS/Java today, that may change in the future. If OpenACS/Java is going to be a straight Tcl/Java port, then the queries will remain the same. If the queries remain the same, we need a programming-language independent means of storing the queries. In general, if we're going to separate SQL from code, we don't want to make our SQL-dispatching dependent on a particular programming language.

  • Verbose query names: Ryan's comment is interesting. How does one reuse queries? It would be nice to have a naming scheme that allows for that. The problem is that current packages and current naming schemes assume uniqueness by Tcl page or Tcl proc, nothing more. Thus, I'm afraid there might be conflicts if we go to a package-level naming scheme. Maybe we can dedicate part of the namespace to package-shared queries. Or we can use the Tcl proc wrapping idea. I'm not 100% sure. But this is a very good point. As long as we're naming queries, reusing existing queries is something we should be able to do easily.

  • XML parsing and spec: Yes, the current XML spec is primitive, and I'm happy to take suggestions in improving it. While I'm very familiar with XML parsers and knowing where XML is useful, I haven't designed many DTDs to date. Someone who has should jump in and help me out here. Oh, and having the FullQuery name in the XML is obvious... sorry about that one! Good catch. As for having the filename and proc name in the XML, that will be part of the FullQuery name already (once that's fixed of course!).

  • Location of XML Query files: Ideally, there would be three directories (www, templates, and sql), ACS 3.x style. However, ACS 4.x chose to co-locate the .tcl and .adp files. Unless we want to change that, too, I don't think it makes sense to create a parallel structure for .sql files. What do people think?

  • Making a Developer's Life Easier: Kapil, I think your code is cool. Python or any other language, it's useful stuff. More than just cool, I agree with your approach. If people want to code for one DB and then want to use your extractor to make it more multi-RDBMS friendly, then that's awesome. If you polish up that code, make it more modular, and eventually make it fit the final spec we agree upon for the layout of these files, I will happily include your code in the distribution as a developer tool.

These are great comments. I will begin to edit the specs with some of these good ideas. In the meantime, please feel free to continue to contribute! This is great stuff!