Forum OpenACS Improvement Proposals (TIPs): TIP #37 (Rejected): Queries in XQL file conventions
- Allow queries that are common between Oracle and PostgreSQL to be in the .tcl file.
- One less file to open/miss/dig around in when developing or troubleshooting.
- Queries in .xql files are not ordered, making it even more bothersome to find the right query. By eliminating some of the .xql files, the problem is lessened.
- Query names are easy to get wrong, causing your code to not work. By eliminating some of the .xql files, the problem is lessened.
- When you add a new .xql file, you must then remember to visit the APM to watch it, including if you have a "watch all files in package", because this doesn't actively scan for new files to watch. By eliminating some of the .xql files, the problem is lessened.
- The unnecessary .xql file is just one of a long list of nuisances which new developers must go through.
- If some day we should decide to add another database maybe, which could not use the same queries as PG and Oracle, we would have to move a few of those queries out into .XQL files then.
1) given how different PG and Oracle are on most points, I doubt that we'd find a DB that doesn't support queries currently supported by both Oracle and PG.
2) if that were to happen, we'd probably have to make other changes as well, including changes in Tcl.
3) adding a third database is very unlikely, given how much of a nuisance having to develop and test on two databases (not to mention, installing Oracle) is alreeady considered to be.
ADDITION since Don's comment on consistency:
- Inconsistent with most current code, which has all queries in .xql files, regardless of DB-specific or not.
However, the main comment that comes to mind is that in order to do this you will have to make sure that every single bit of SQL that has been left behind in existing queries has been removed or replaced. I know that this has been an ongoing process but there is probably some left. This could be a huge job, or not, depending on how much has already been done.
If we take this route we'll end up with a toolkit that has some generic queries embedded in Tcl, others in .xql files, no consistency whatsoever.
In order to get consistency, which is easier? Move the embedded queries in your new code to .xql files, or to get rid of all the generic .xql files and embed these queries in .tcl files?
Because the latter is what we should do to avoid a mish-mash if your proposal is adopted. We should shoot for consistency and and either have only RDBMS-specific queries in RDBMS-specific .xql files, with all generic queries embedded in .tcl, or continue according to today's development standard, all queries in .xql files.
However, I'll put my two cents in for the idea of having all the queries in the tcl scripts. I don't know if it could be reasonably accomplished, having given no thought to how it would be implemented, but strictly from a programming usability POV having the queries in seperate files drives me nuts. It's much harder to visualize what an unfamiliar piece of code does without being able to see all the code, no matter how well-named the query is.
One thing that makes queries different is the join semantics. If joins were migrated to views, maybe the semantics in the .xql files could be the same, allowing you to reduce the number of page-oracle.sql and page-postgresql.xql files. This would also allow reuse of the view in non-UI scripts and in pl code.
Janine ... when we first discussed supporing both Oracle and PG and (potentially) other RDBMSs there was a lot of discussion of the merits of putting conditionalized queries inline in .tcl files and doing separate query files. Both solutions have inherent uglinesses about them, the queryfile approach was adopted as the lesser of evils solution after a bunch of discussion among those who were going to do the porting.
I could adopt to either using query files exclusively or dumping them entirely, though. It's this notion of having some of the toolkit programmed one way, the rest another way that irks me.
We already support too many different ways of doing different things IMO - look at the recent discussion regarding ad_proc parameter conventions!
This is just how I feel, though ...
Lars - one (slight) benefit is that inline queries are slightly more efficient, though in practice there's no evidence that queryfiles are a barrier to adequate scalability (the RDBMS parse, optimization and execution time far outweighs query lookup time).
Deciding to use namespaces, named parameters, ad_form, /www/resources and /lib directories, or any other new preferred way of doing things requires a break with the past, and, when resources allow, reworking existing code to match the new coding standard.
Then once we've found the new coding standard, we can determine how to get there. This would fall within a range of options, including:
- It's too much work, stick with and enforce what we've done in the past
- Feel free to change old style to new style when you come across it
- Stick with old style for current release, and in a future release we'll rework the existing code
- Rework all existing code now
Needless to say, I'll be happy to follow whatever decision we make through the TIP process.
Also needless to say, I personally find that moving all queries to .xql files is a waste of time.
Regarding reuse of queries, I find that wrapping them in procs allows for cleaner reuse than using some other proc or page's query. The query executed by a proc is part of its internal implementation, not part of the public API guaranteed to provide backwards compatibility, and thus shouldn't be done.
Alternatively, a package could provide a set of public named queries that would be part of the package's public API, which could then be used both by the package's own procs and pages, as well as other packages.
As far as the general philosophy regarding changes to coding standards go ... I don't see much sense in deciding to make major changes to the way we do things unless we're also going to decide to make old code adhere to the new way of doing things.
And if we're going to drop query files for generic queries while maintaining them for RDMBS-specific queries ... it just feels like the worst of both approaches. I'd much rather have all queries separated or all queries inline, myself. One or the other. And at the time the decision to use query files was made, the half-dozen or so of us discussing it seemed to agree on this point, at least.
I don't remember anyone suggesting that having some queries inline, others in query files was an agreeable solution.
To be honest if it weren't for the desire to support existing ACS packages with no change in our first OpenACS release, the db_* API procs would've been changed to take an explicit -sql parameter if SQL was being passed in (as we do for dynamic queries build from db_map calls) with no sql param passed in at all in the normal case.
Anyway I'm interested in feedback from other OCT members...
Part of our ability to do so (and produce military aviation quality code) rested on following a coding standard. It is MORE difficult for a novice programmer to figure out the system if things are done multiple ways.
I actually look forward to a time when we use the table-lite spec or the ancient (and secret) prototype-builder or some new-and-improved spec to generate XQL files from a table spec to allow rapid development (and forms and acs-objects, etc).. The more consistent we are with the way we build things, the easier the tool will be to write.
Having said this I understand where Lars and Joel are coming from. Putting queries inline is certainly convenient and very speedy during development. So then again, for consisency, if inline SQL is preferable, let's just always keep it inline.
Related to this issue, the thing I feel strongest about is removing inline SQL that is not being used. Regardless of where you keep your SQL, please just keep it in one place... Preferably we would write a script to remove all obsolete inline SQL.
I agree that we should absolutely remove current inline queries where they're also in an .XQL file, given that the .XQL files always take precedence.
But if we do remove inline SQL where also in the .XQL file, I don't think consistency is a major concern under this scheme, since the process for finding the SQL code will not be interrupted.
With the proposed scheme, after removing extra inline queries, but before moving all common queries from .xql file into .tcl file:
1. Open the Tcl file of the page or library proc to find the db_* command. (100%)
2. If there's inline SQL, you've got the query (~20%)
3. If not, go look in the generic .XQL file (~50%)
4. If not, find it in the db-specific .XQL file (~30%)
With the proposed scheme, after removing extra inline queries, and after moving all common queries from .xql file into .tcl file:
1. Open the Tcl file of the page or library proc to find the db_* command. (100%)
2. If there's inline SQL, you've got the query (70%)
3. If not, find it in the db-specific .XQL file (30%)
If this proposal is rejected (after removing extra inline queries):
1. Open the Tcl file of the page or library proc to find the db_* command. The SQL part will be empty. (100%)
2. Open the generic .XQL file, to see if the query is there (70%)
3. If not there, go check the db-specific .XQL file (30%)
My point is that the proposal doesn't interfere with the development flow, but instead it lets you save the hassle of opening and searching through another file 20-70% of the time.
Oh, well, Don's and my position are clear. Can we get some OCT votes, so we can get this matter resolved and move on, please.
I think Alfred's point is a good one, there is value to being able to write static code analyzers but I don't think sql in the tcl files is a significant roadblock to doing so (in particular, you need to have a pretty smart parser already to deduce the fully qualified qeury name and pulling the sql out at the same time is pretty trivial relative to that problem).
Given that to date we have not done anything fancy with reusable queries and that there is no support for paramaterizing queries (and I agree with lars it's better to put the query in a tcl proc and have that be the reuse mechanism), I just don't see a material downside to doing this.
I think we should continue down that road. And make it an easy coding standard to follow (SQL goes in .xql files). Otherwise people might get confused and put an Oracle specific query in the .tcl file. I'm sure it won't happen to OCT members, but we are not the only ones developing and for newbies sticking to simple consistent rules is IMO better.
I'm not against it in a way that I'd veto it, but I'd adwise against it if asked.
- To quickly find a query in an XQL file.
- To browse the list of queries in an XQL file.
- Execute queries directly from an XQL file, including bind variable substitution.
- Add new query definitions to an XQL file with a couple of key strokes.
The above work is complete and has been tested by a hand full of OpenACS developers besides myself.
While the use of XQL files does require some getting used to and does have a set of drawbacks I never found the use of XQL files problematic. If anything, the lingering queries in the Tcl code caused the most confusion. My vote is to remove all queries from Tcl and place them exclusively in XQL files.
Inline queries are a must for development, and for all the custom development that is for one database only and is never intended to be distributed to or used by anyone but the client, there generally isn't the slightest reason to use anything but inline queries.
Personally, in hindsight I don't think using XML (rather than a Tcl syntax) for the query files was such a great choice, but all the work was done way back when, and now that tdom is so much faster than ns_xml used to be it doesn't really matter much. I certainly don't see any reason to change that now.
In my experience there's only one practical problem with the current support for the mix of inline queries and queries in various different XQL files: It's very easy to accidentally define the same query in several different places, and then the query dispatcher will silently choose one of them. If feasible, it might be best to have a configuration switch to optionally make the query dispatcher throw an error whenever it encounters multiple definitions of the same query.
-sqlswitch in order to use inline queries sounds just fine to me. It's clearer than what we have now, and it should be fairly easy to go through a whole codebase and update the inline queries to that new syntax.
But some people did care so the decision was made to use queryfiles. The drive in this direction originally was made by aD and Ben when it was decided to put querynames in the db_* API (if the conditionalized inline query approach is taken there's no reason for queries to have names).
But I will repeat that no one argued for a mixed approach, at the time of the discussion the value of choosing one and only one approach was accepted by all. Adopting a mixed approach seems to me much worse than taking one direction or the other.
It's use would be optional ... all the "KEEP SQL AND TCL SEPARATE" folks could ignore it, make their cvs check-in and -outs as normal, and be happy. All the "IT'S EASIER TO UNDERSTAND WHEN THE LOGIC IS ON ONE PLACE" folks could do a cvs checkout, run the preprocessor, and have the code in a much easier to understand format. As a side benefit, you could start off your AOLServer/OpenACS instance without the humungous XML parse overhead, have a slightly lower memory footprint, lose a huge number of useless XQL files from your version control tree, and avoid a large chunk of query processing code (as well as the tDOM requirement -- making for a simpler install).
There is a small problem if they wanted to check code back into the project, but doing query extraction by hand or in combination with a tool that did most of it wouldn't be all that hard.
Speaking of which, when I brought this up before, someone pointed out that the existing query extractor (TCL+SQL -> TCL+XQL) tool (written in python) didn't really work very well. And, as another data point, I tried writing the aforementioned tool in perl without doing a formal grammar, and I think I got it to 90%+ working. Still not good enough. Despite that, I think that a more robust implementation using tclparse (which exposes the actual tcl parser) could work. From my experience, there were only a handful of dynamically assembled sql calls that an automatic approach might have trouble with.
I see XQL files as leaning way to heavily towards code organization (OpenACS maintainers) rather than ease of understanding for new folks or people trying to do work with OpenACS. That's understandable considering who came up with it. But it would be nice to come up with a solution that kept both camps happy.
This isn't the kind of thing you could do without good unit tests, but since we're not too far off from having those I thought I'd bring it up.
As far as inventing tools to make both camps happy ... it strikes me once again that as a community we have a very difficult time concentrating on the important things.
And this ain't one of them, frankly. We shouldn't be thinking about writing a tool in order to avoid making a decision.
If we write any tool at all, it should probably be one to organize the contents of query files inline, using -generic, -postgresql, and -oracle switches. Get rid of query files and the query name parameter and be done with it, if people don't like them. This would maintain the consistency I feel is so important, at least, Lord knows I have no emotional investment in queryfiles per se, I'm just devoted to doing things one way and one way only unless there's extremely good justification for not doing so.
But maybe I'm flogging a dead horse. The rest of the toolkit's a mishmash of conflicting coding styles, after all.
I like Don's suggestion of putting all queries back into the .tcl file, with switches for the sql for different databases. It would make code more readable for me, since the sql is situated in the context of the file where it's used. Also, it would be much simpler to port one-database files to multiple, since everything is right there in one file instead of spanning up to 3.
When I develop for my own local sites, I almost always put queries in the .tcl file. Going through the hassle of separating that out doesn't give me any benefits. If, as has happened with others, such local code is deemed applicable to the larger toolkit, that sql must first be pulled out of those .tcl files into the appropriate .xql. I think flags for the db_* query handlers would greatly ease the road for code contribution by eliminating the need for this step.
"I see XQL files as leaning way to heavily towards code organization (OpenACS maintainers) rather than ease of understanding for new folks or people trying to do work with OpenACS. ..it would be nice to come up with a solution that kept both camps happy."
a slow-learning OpenACS developer's perspective: I find keeping up with changes in development, including development style, has been the greatest problem. So many changes, so fast, is dizzying!
Any framework will have strengths and weaknesses. Even I see how the external named queries are great for reducing tcl code complexity, improving query presentation (xml) and providing reference names for discussion; And the inline queries are great for quick changes.
Having a system with two frameworks available can be a bonus if both can be supported without performance loss. Just document when is best to use each technique, please.
Of course that never happened, and probably never will. But if it ever does happen, those language-neutral XML query files will (as it stands now anyway) be there waiting for them. (Who knows, maybe those SMLserver guys will want to guys will want to write OpenACS modules in Standard ML. Maybe in some alternate universe.)
But while all that is useful to think about for whatever lessons it may teach us for future projects, it's all water long since under the bridge as far as OpenACS is concerned now.
Regardless, Lars's original proposal is much simpler and more focussed than most of this discussion. And I don't see any serious problem regardless of which way the decision goes on that proposal.
On the other hand, if that optional pre-processor is attractive and important purely because of performance problems (in even some cases) with handling the XQL files, then something else about the OpenACS XQL architecture or tools is probably wrong, and should be addressed separately.
Longer run, seems that Don has tentatively (hypothetically?) suggested
eliminating the XQL querfiles altogether, and defining a simple syntax
for putting all the db-specific SQL into the same Tcl files. Not
-sql switch for inline queries, but several at
The main advantage I see there is it lets (indeed, forces) you to see the same query for all the databases together at once. Upside is this would push much greater cross-db knowledge, and help eliminate spurious drift and redundancy in the Oracle vs. Postgres queries. All "official" OpenACS packages are supposed to support both databases anyway, and this would make the work of doing so somewhat easier.
The downside of course, is that it does tie support for the different databases more closely together. In the past people seemed to think it advantageous that, at least in theory, some outside group could decide to add support for a 3rd database just by adding a whole bunch of new XQL files, without having to touch the rest of the toolkit at all. But given history of Oracle and PostgreSQL support, and the continual ongoing development of the OpenACS toolkit itself, I doubt that any such "just drop in support for a 3rd database" project would be that simple. My guess is it that in reality it would need extensive co-ordination and some changes from the core OpenACS developers no matter what, whether the SQL is all in separate XQL files or not. What do you think, is my guess here wrong?
But, really, Don's
suggestion is a major change that would affect a lot of people, and
nobody's going to do it anytime soon, right? (My guess is probably
never, but not till OpenACS 5.1 at the very earliest.)
Six of one, half a dozen of the other...
If we change the way we deal with queries then we should rethink the whole thing and in that case I'd strongly prefer Don's (?) -generic, -oracle, -postgresql style. Until then, I'll happily use Bart's emacs extensions to make my life easier.
Consistency is the most important point. Changing the way it works now would be a huge task. That effort could be better applied to other items,
There are lots of good ideas here and some might be better then our current system but I feel that part of the job of the OCT is to prioritize and I don't see this as a priority right now.
Does this mean that any idea that might be good to implement, but doesn't make the top 3 priority list is in danger of being vetoed?! Frankly, I don't think that's the proper way to prioritize.
I understand your frustration. Moving all the queries is a huge job and touches almost every file in the system. It would be a good idea to write automated tests to make sure that after such a change, everything still worked the same.
I won't guess how many volunteers working for how long it would require, but I can imagine that there are better things to do with their time. I didn't mention any top 3 priorities, but there are still things that are actually broken that noone has stepped forward to address.
Sweeping these kinds of things under the rug isn't the way to go, imo. With Tcl TIPs, the one proposing the change is responsible for either implementing it, or finding someone who will implement it. A TIP approval says nothing about when it needs to be done, or even that the OCT must take time to do it themselves. I see it as moving that proposal onto the official "todo" list. Scheduling those todos comes later.
Furthermore, you don't loose functionality at the moment. All the veto says is: We do not encourage to put SQL code in the TCL code for consistency. And if you expect your code to make it into the toolkit you are highly encouraged to get you SQL code out of the TCL code and put it in XQL files. The teaching and training materials will train to use XQL files.
For development you can always us inline SQL (it's not like we'd turn the functionality of using inline SQL off).
2) I think the original TIP was slightly unclear in that it did not describe the current situation fully. The current situation is, I believe, that "best practice" is for all SQL to be in XQL files. The veto means that this is still the case.
3) The status quo leaves two obvious questions which I don't believe were directly addressed:
a) Will we reject packages that don't meet this best practice? (No idea what the consensus would be - sounds like a good show-of-hands question in an OCT meeting)
b) Will we change the kernel to enforce this best practice - ie, will we start ignoring any SQL in TCL files? Argument for: enforces consistency. Argument against: is a useful shortcut during development; requires lots of work with only a potential, future reward. I think this one is a clear no, and status quo is "no," so unless someone starts a new TIP I don't think this issue is open.
DaveB: no (?) Don: no Jeff: yes Lars: yes Roberto: ? Peter: ? Caroline: no Tilmann: no Malte: nowhere ? means didn't vote and (?) means I think it was a no but I am not sure if it was no to the proposal or if it was just no to trying to fix any that currently exist (remember that the original proposal is simply one of coding standards i.e. explicitly stating that having generic queries in the .tcl file is acceptible).
Apologies if I got anyones vote wrong here.
In any case it seems like the required 2/3 is not there so this is rejected and the status quo coding standard: all queries in .xql files with the exception of things that cannot be multidb by design (schema browser for example).
It's also worth noting that the existing packages do not adhere to the existing standard very well in any case, and regardless of whether this is approved there is a lot of work to be done to bring things into line.
What spurred this TIP is that we at Collaboraid have sped up our development by not wasting time creating and maintaining the .XQL files for common SQL.
In other words, we have contributed code with common SQL in the TCL files instead of the XQL files.
Vetoing or rejecting the current proposal incurs additional overhead on development. Specifically, it requires us to spend time that we could have on other things on the zero-gain task of moving SQL code out of Tcl files and into XQL files.
The time spent adhering to the coding standard required by this TIP could have been spent fixing bugs, pushing the release out, or adding new functionality.
As a result of the vote, we have chosen to prioritize our resources towards extra burdens on development over bug fixes.
Not a very useful prioritization IMO.
Oh, well, we can't all agree on everything. :)
If the TIP had passed I had every intention of being gracious about it, even though I disagreed with it.
Not all coding standards are as quibbly as perhaps the "use .xql files" standard is.
But we had a similar experience with .LRN, where OpenForce delivered a product that in many places is full of Tcl code dropped helter-skelter into .adp files. They did so without bothering to discuss the decision with the community, though they knew that the prohibition against doing so was a coding standard for not only OpenACS but aD's ACS 4.x project as well.
Why did they do it? Because it saved them time when writing certain especially tricky pages (at least they claimed it did).
Now I don't think you'll find a single OCT member who will endorse that practice, but due to the informality of the community and the fact that .LRN was a standalone effort by OF/Sloan in those days we didn't realize they were doing this until the code was released.
And many of us went "YECH!" when we saw what they'd done.
So I'm hoping we all at least see the value of deciding such things at the TIP level, even if we disagree over the rejection of this particular one ...
I did feel like some of the arguments given for rejection above misrepresented reality, so I saw a need to clarify, in case anyone wanted to think more about this going forward. If that came off as being ungracious, I apologize.
For the record, I didn't mean to unilaterally abandon anything. I wasn't aware that it was a decided coding standard until you brought my attention to it. Where is this documented again?
Regardless, now the TIP has been decided on. Let's put this to rest.
So this is actually a nice validation of the TIP process (thanks Joel) - we've got a location that we can point to in terms of the discussion and decision over a contentious issue.
Which is great!
(I think it would be helpful, though, to establish some preliminary workflow so that new ideas aren't immediately proposed as TIPs. We've got some 4 or 5 line TIPs that should be discussed in the design forum before being presented.)