Forum OpenACS Development: db_multirow and db_foreach -unclobber

Collapse
Posted by Lars Pind on
Hi all

I've added a switch -unclobber to db_multirow which causes it to restore the status of all the variables which it overwrites inside the code_chunk loop after it's done.

So if you have something like this:

db_multirow foo bar { 
    select object_type, creation_date from acs_objects 
} {
    ...
}
... then after the db_multirow, the 'object_type' local variable will contain whatever value the object_type column had in the last row of the query.

If, instead, you had said

db_multirow -unclobber foo bar { 
    select object_type, creation_date from acs_objects 
} {
    ...
}
(notice the -unclobber switch), then object_type would be in whichever state it was before the db_multirow: If it was undefined before, it'll still be undefined. If it had some value, it will have that value again.

My patch also handles arrays. And variables which do not correspond to columns in the query aren't touched, i.e., you can still use local variables as counters, etc.

The motivation was that I was annoyed that I couldn't use the same name for my page variables (e.g. 'project_id') as the column names in my query, because building the multirow would overwrite the values of the query vars.

Another solution to this problem, which I would like to look into at a later date, would be to use put query variables into a separate namespace where they're guaranteed to stay safe. This is a bit tricky, though, since I'd want it to work with include templates as well.

So far it only lives in my local checkout. I'm curious if this is something you'd all like to see go into OpenACS?

Should the default behavior be to clobber or unclobber?

/Lars

Collapse
Posted by Peter Marklund on
Lars,
I think unclobber is an excellent fix! I had the exact same problem that you are describing when building the Logger. I definetely think unclobber should be the default. It would surprise me if there is any code out there relying on the clobbering effect. If they are, we might be doing them a favor by breaking their code so that they can fix it.
Collapse
Posted by Bart Teeuwisse on
Lars,

why not fold the -unclobber patch into the standard db_ behavior and do without a new switch altogether? I can't think of anyone actually needing to clobber the variables in the calling code.

My 2 cents.

/Bart

Collapse
Posted by Jonathan Ellis on
Sometimes after a db_foreach it's handy to have the values for the last row processed still hanging around.  I'd prefer not making the introspection overhead mandatory when the current default makes just as much if not more sense.
Collapse
Posted by Peter Marklund on
I can have respect for upgradability arguments, but I have to point out that relying on the values in the last selected row of the db_multirow to "hang around" makes for very obscure, unmaintainable, and thus error prone code. It is certainly not the behaviour that I would expect from a db_mutlirow type command and I doubt that it was ever intentional.
Collapse
Posted by Jonathan Ellis on
I did say foreach, not multirow.
Collapse
Posted by Jarkko Laine on
Jonathan,

Sure you did, but what Peter said applies to db_foreach as well :o)

Collapse
Posted by Jonathan Ellis on
I don't see how.  It's neither obscure, since db_1row et al necessarily set variables in parent scope, nor bad form when you can't do what you need to do in the foreach.  For instance, if you select foo from bar order by foo and want to see if the last foo has a value over some cutoff point, what other options do you have?  Manually setting some variable(s) inside the foreach body to what you are interested in, with each iteration, just so it is preserved afterwards?  Forcing developers into that for the sake of some misguided purity is stupid.
Collapse
Posted by Jarkko Laine on
I can see your point about the cutoff point, Jonathan.

However, I don't think comparing db_1row et al which return just one row to these multirow procs is worthwhile. It's obvious what you get from db_1row, but it's definitely not that intuitive that you get the values of last db row in local variables when you use db_multirow or db_foreach. Like Peter, I wouldn't assume to get local variables overwritten.

I don't think this would be "Forcing developers into that for the sake of some misguided purity". Now we are forcing developers to save their local variables always when they use db_multirow or db_foreach if they aren't sure there's no columns in the query with the same name. I think that instead of misguided purity we are talking about which way would be more useful. I would take Peter's side but it's just my opinion.

Maybe there could be an additional code block for the last row of db_foreach for this, if it's really needed. Or a -clobber parameter in the spirit Lars mentioned.

Collapse
Posted by Jonathan Ellis on
It's impossible to prove intuitiveness so I won't try; I was just pointing out that to me it's not at all unreasonable to expect them to be set in the caller's scope, particularly if you're familiar with how tcl uplevel works.

If you are going to take the switch approach, I think the new functionality should be optional rather than break backwards compatibility needlessly.  As for the code block, I'd rather not overcomplicate the API when it's currently able to get the job done quite nicely.

Collapse
Posted by Robert Locke on
Looking at Tcl's foreach, if I do:
    set somevar old
    set somelist [list 1 2 3 new]
    foreach somevar $somelist {}

Then "somevar" will have a value of "new" (not "old") after the foreach has been executed.

I understand the motivation for Lars' change, but I find the current behavior more consistent with how I expect Tcl to work.

Perhaps add the noclobber option, but do not make it the default?  This also skirts the backward-compatibility issue.

Collapse
Posted by Vinod Kurup on
I think that db_foreach and db_multirow should have different behaviors. db_foreach should be an analog to TCL's foreach, so it should clobber vars in the local namespace. db_multirow is meant to create vars in a specific namespace (i.e. in an array which has the name of the first argument to db_multirow), so it should *not* clobber vars in the local namespace. IMHO, of course.
Collapse
Posted by Jonathan Ellis on
that makes sense to me.
Collapse
Posted by Tom Jackson on

The old ACS used to have a function that processed a select, and prepended a string to the column names so as to avoid clobbering local vars. One variant allowed you to set the value of the prepended string.

Although we can speculate as to what the correct behavior should have been for db_multirow, it seems to be a few years too late to change the default.

If instead of -unclobber, you used something like -prepend, you would never have to unclobber the vars, also allowing local vars to be available to the tcl code block for the db_multirow. The default to -prepend would be the empty string, allowing the current behavior. db_1row would also benefit from a -prepend switch.

db_multirow is also useful when you know only one row will be returned, and you want to use array/multiple syntax for handling the result.

Collapse
Posted by Peter Marklund on
Tcl's upvar feature is very powerful but should be used with great caution. In general, it is not a good idea for a procedure to clobber variables in its callers namespace unless it is exceedingly clear to the calller that this happens and the caller can control it. I don't think any looping construct should clobber variables in the surrounding code. As an example of this idea, consider the for loop in Java where the count variables only lives inside the scope of the loop. This is not "musguided purism", simply sound software practice. Procedures should be cohesive and have a very simple and well defined contract. Clobbering doesn's serve any of these two goals.

Lars and I both got burnt independently of eachother by the current behaviour of db_multirow. Changing the default would  mean that we would have to add the -clobber switch in probably less than 1% of invocations of the proc. That seems like a very cheap price to pay for making db_multirow more robust by making it interact less with its calling environment in the 99% of cases where this interaction is not needed or desired.

Collapse
Posted by Tom Jackson on

In fact it should be clear to anyone who has been working with AOLserver's ns_db API: when you are processing rows, the caller's environment is visable. The new db_* API is just a nice wrapper around that API. The proc is just there to allow re-use of code, not to protect the tcl page from unsound programming practices. This is what uplevel is supposed to be used for. If you don't want to clobber the vars, they really should not have the same name on your tcl page in the first place. Using one var name for two distinct purposes is bad programming practice.

I haven't looked at the code for db_multirow, but if it clobbers vars you would not expect it to clobber, like a loop var, or something, that should be documented, and possibly changed, as this would be a bug waiting to surface.

Collapse
Posted by Jonathan Ellis on
I haven't checked that specifically either, but aD was usually pretty careful about naming loop vars used in an uplevel block my_really_long_counter_i or something to avoid stepping on toes.
Collapse
Posted by Lars Pind on
A few comments ...

1) db_multirow doesn't clobber variables other than the ones that are named the same as your column.

2) I agree that using the same local variable for multiple purposes is bad practice, but the problem is not with the page design but with the system design.

I think that we've overloaded the local variables namespace. We stick the query variables from ad_page_contract there, we put the columns from our database procs there, and finally, we use the local variables to pass properties to the template.

I actually feel like all 3 of these need to be broken away from the local variables namespace somehow, since they're all part of a 'public API' of sorts, and thus you should have freedom in your choice of naming.

3) I also agree that db_foreach and db_multirow are modeled after foreach, and thus I can see the value of keeping the default behavior what it is now.

So if we could find a way to store query variables in a 'query::' namespace and properties in a 'property::' namespace, then maybe that would solve the problem better.

/Lars

Collapse
Posted by Andrew Piskorski on
Lars's "query::" and "property::" namespaces definitely sound like the best idea proposed so far. (Not that I've tried to implement it, though...)
Collapse
Posted by Tom Jackson on

Lars, Andrew:

I was thinking the namespace idea was something helpful as well, but when trying to write a reply that indicated so, I seemed to stall out as to why it would be a good idea.

The truth is, namespaces will not solve the problem, which is one that must be solved on a per page basis, taking into consideration the different variables being used.

Example: you write a query that has several column names that are the same. Obviously they have distinct meanings in the context of the query, so you use 'as' to rename the column. This is the exact problem we have, fortunately it is solved at the sql level in a self documenting format. The fact that higher up on the page, you had a similarly named variable doesn't remove the fact that the variables have distinct meanings, and should have distinct names. Putting the variable into a fixed namespace doesn't shed much light on the meaning of a variable, but renaming it would.

In ad_page_contract, it would be great to have an aliasing mechanism similar to those used in (pl/)sql, but the whole point of db_* procs is to operate in the same namespace, and hopefully the same level. Creating separate namespaces would only foster confusion. Why should variables be named based upon where they came from (query::, property::)? They should be named based on their meaning and use on that page.

Collapse
Posted by Peter Marklund on
Separating ad_page_contract vars into a namespace or an array sounds like a reasonable idea to me. The fact is that when you start to have long and complex pages with maybe a handful of db_multirows and loads of local variables and query variables things start to get messy and error prone.

I had the privilege of working in Munich with Hrvoje Niksic, the author of wget and a very talented programmer. As we were sifting through some old aD code (quite possibly written by Phil G himself) Hrvoje used to joke that hmm, I wonder if a variable with this name is set in this scope, I really have no idea, but it probably is, so I'll just try it... This anecdote to me illustrates the general need for increased variable space separation in the toolkit.

Finally I just want to point out that the issue of the query and property namespaces work in the same direction as the proposed change to make db_multirow well behaved and not clobber its callers environment. The query and property namespaces do not fully solve the clobbering problem though - a problem that needs separate attention.

Collapse
Posted by Tom Jackson on

ad_page_contract turns vars with a dot in the name into arrays, so you can't put them into another array, I don't think.

I have written perhaps hundreds of complex pages with multiple db_multirow, db_1row, etc. I have yet to find a problem. What is the problem? Can we get one actual example of a problem first. We currently have an entire toolkit that somehow avoids the problem. When you rewrite the toolkit to use property:: and query::, what is a tcl page going to look like once converted?

I'm just guessing that you might end up with something like property::object_id, query::object_id and object_id on the same page. Is this correct? Of course the whole thing falls apart as soon as you have two queries with object_id in them. Please tell me how I am wrong, because I usually misunderstand the most basic concepts.

Collapse
Posted by Peter Marklund on
Tom,
I think my variable interference was that I had a set user_id [ad_conn user_id] and then I had possibly multiple db_multirows that selected user_id so that the value of user_id as the script progressed would change in undesired ways. At any point in the script the value of user_id would be fairly unpredictable and this broke my page.

I'm not sure I see the need for a namespace for db queries. I think it might be sufficient to have them be local variables. For one row queries I think arrays provide good namespace separation, selecting a single value can go into a local variable, if you are doing a db_multirow you should only be initializing the multirow data structure without clobbering the envrionment.

It *would* be nice if we could make the properties block of ad_page_contract actually do something though (strictly give the adp access only to vars listed in that block). That would have solved my interference problem in the adp, but not in the tcl script.

Collapse
Posted by Lars Pind on
The example I had is a page that lists logger entries. A logger entry is keyed to a user, a project, a variable, etc., through columns user_id, project_id, variable_id.

Then I want that page to be able to show me only rows with a certain user_id, project_id, and/or variable_id.

I also need to list the available project_id's, user_id's, and variable_id's, to let the user choose which ones to show.

Now, of course, when I iterate over the possible user_id's, unless I rename either the public page variable in ad_page_contract to something like 'selected_user_id', or rename the column on the fly using 'as row_user_id', which means that'll also be the name of that column in the multirow which we create, which I also find bad.

Bottom line: As things stand today, I have to find another name for my variable in one of two places (the page query or the template property), both of which are part of the page's "public API".

Hope that clarified things a bit.

/Lars

Collapse
Posted by Tom Jackson on

It seems to me the on the fly renaming of the query var is the way to go, since that is already supported, and everyone knows what you are doing, and probably why. The name of the var in @foo.bar@ never appears on the rendered page, but the programmer still has to see it. I can see that it might be annoying to have longer names for the vars, when you think you are just creating an array.

I think the array variable should be thought of more as a convenience for iterating over the rows, actually a quite inconvenient 'convenience' when you think about it. The vars could just have easily been named foo.bar.1, foo.baz.1, etc. which would have made some things easier to do on the adp page. But there should not be a difference in how things act based on selecting one row or mulitple rows. The data has the same meaning regardless. If there is a naming conflict with one row, it should also exist with multiple rows.

-unclobber might still be a good idea, but it might be nice to be able to specify a list of vars to unclobber. Does your code somehow parse the query to figure out what to unclobber?

Collapse
Posted by Don Baccus on
I've been staying out of this but I have to say I agree with Tom on this issue.

After all these years the fact that "user_id" is used confusingly just now has become a problem that needs to be resolved by adding namespaces requiring us to touch every page in the toolkit?

At the most extreme that's what's being talked about.

A couple of quick observations ...

1. Tcl is not a block-scoped language, talking about block-local vars ala Java doesn't really make sense.  It's proc-scoped.  Uplevel's not particularly cool but the way it is used in the db_* procs is consistent with the db API and easy to understand.  Given that Tcl doesn't support reference parameters I think the design's a reasonable compromise.

2. I like consistency.  Making some db_* procs work one way, others work another way is just confusing for newcomers and error-prone for everyone else.

3. "user_id", "package_id" and a few other ids are overloaded in the datamodel and ad_conn structure.  The examples mentioned above pinpoint *this* as being the real problem.  If people got in the habit of saying

set current_user_id [ad_conn user_id]

instread of

set user_id [ad_conn user_id]

then from reading this thread, at least, I get the impression that 90% of the supposed problem would fade away.

When I look at all the *important* problems we still need to solve in the toolkit I have to wonder why we would consider this particular minor issue even worth the time spent discussing it???

Collapse
Posted by Peter Marklund on
Don,
I agree about the relative unimportance of this issue. Also, changing variable name to current_user_id happens to be exactly the solution I picked for my scripts.

However, I still think the unclobbering solution by Lars is correct, and if I can't make people think it should be the default I will at least use the -unclobber switch for all my multirows, that's for sure.

Collapse
Posted by Lars Pind on
A couple of quick observations from here:

1. I already have the code sitting in my local checkout. It took about 20 minutes to write. I just posted to ask if it would be useful to other people.

2. The problem can be solved by renaming variables, yes. But it's not quite as simple as you make it. Or, rather, the thing that annoys me is that you have to rename *either* your page variables (which are part of your page's public API), your table columns (which are part of your application's API), or the column names in the multirow (also part of your page's API).

I admit that it's not a terribly big problem, it was just something that bit me and annoyed me a couple too many times, so I figured I'd take 20 minutes to implement a simple solution.

/Lars

Collapse
Posted by Tom Jackson on

Lars, as everyone here is painfully aware of by now, I get annoyed by changes to the core API, and core philosophy. And one of the main reasons is that I feel, rightly or wrongly, that these changes can be introduced with the tinyiest of discussions over the shortest period of time. If you go away for a three day weekend, the decision has already been made.

Even though your 20 minute fix is backwards compatible, it still adds to the processing of every db_multirow call. When someone has to wade through that code to trace some bug, in the future, they will have to consider your addition. I'm also not sure why the -unclobber only works if you have a code_block block. Shouldn't it work regardless?

Why not a non-core change? I would suggest a save-restore package, that might take more than 20 minutes to write, but it would not affect the core, and you could re-use the code anywhere it might help.

Pesonally I feel it is the job of the page programmer to work out these naming conflicts. The tcl/adp page are one thing. In fact the entire set of code that is executed in response to a request is one thing. Ignoring this fact will lead to bugs which are hard to trace.

Maybe it has something to do with my name. Whenever I call someone up, even close friends, I introduce myself as 'Tom Jackson'. I realize that the name 'Tom' could mean many things to most of my friends; I want to make sure they understand the correct meaning. Some folks call me up and just start talking, as if they needed no name. I imagine these are Perl programmers of some sort.

The programmer writing a page has the most information and knowledge of what each variable means. Naming them appropriately is the best method of communicating and documenting this information.

Is the logger page you are working on going to become part of the newly released logger package?

Collapse
Posted by Lars Pind on
For what it's worth, here's the diff:
Index: 00-database-procs.tcl
===================================================================
RCS file: /cvsroot/openacs-4/packages/acs-tcl/tcl/00-database-procs.tcl,v
retrieving revision 1.19.2.5
diff -u -r1.19.2.5 00-database-procs.tcl
--- 00-database-procs.tcl	23 May 2003 13:11:29 -0000	1.19.2.5
+++ 00-database-procs.tcl	3 Jun 2003 08:05:35 -0000
@@ -442,6 +442,7 @@
 ad_proc -public db_multirow {
     -local:boolean
     -append:boolean
+    -unclobber:boolean
     {-extend {}}
     var_name
     statement_name
@@ -584,6 +585,24 @@
 Columns in this query: [join [lsort -ascii $local_columns] ", "]" "" "ACS_MULTIROW_APPEND_COLUMNS_MISMATCH"
                     }
                 }
+
+                # Save values of columns which we might clobber
+                if { $unclobber_p && ![empty_string_p $code_block] } {
+                    foreach col $columns {
+                        upvar 1 $col column_value __saved_$col column_save
+
+                        if { [info exists column_value] } {
+                            if { [array exists column_value] } {
+                                array set column_save [array get column_value]
+                            } else {
+                                set column_save $column_value
+                            }
+
+                            # Clear the variable
+                            unset column_value
+                        }
+                    }
+                }
             }
 
 	    if { [empty_string_p $code_block] } {
@@ -650,6 +669,30 @@
             incr local_counter
 	    set array_val(rownum) $counter
 	}
+    }
+
+    # Restore values of columns which we've saved
+    if { $unclobber_p && ![empty_string_p $code_block] && $local_counter > 0 } {
+        foreach col $columns {
+            upvar 1 $col column_value __saved_$col column_save
+
+            # Unset it first, so the road's paved to restoring
+            if { [info exists column_value] } {
+                unset column_value
+            }
+
+            # Restore it
+            if { [info exists column_save] } {
+                if { [array exists column_save] } {
+                    array set column_value [array get column_save]
+                } else {
+                    set column_value $column_save
+                }
+                
+                # And then remove the saved col
+                unset column_save
+            }
+        }
     }
 
     # If the if_no_rows_code is defined, go ahead and run it.
Collapse
Posted by Lars Pind on
Let's see if we can't find a way to table this unproductive disucssion.

1) Nothing's been changed yet.

2) The only overhead it adds if you don't use it is two if's with very simple conditions.

3) It only unclobbers when there's a code block, because it's only when there's a code block that db_multirow clobbers any variables (a current slight difference from Tcl's foreach, actually).

4) I think I disagree with this statement, although I'm not sure what "one thing" means in this context: "Pesonally I feel it is the job of the page programmer to work out these naming conflicts. The tcl/adp page are one thing. In fact the entire set of code that is executed in response to a request is one thing. Ignoring this fact will lead to bugs which are hard to trace."

The whole idea behind ad_page_contract is to document which parameters the page takes and which properties it supplies for the template designer. That's why ad_page_contract is defined in a file called tcl-documentation-procs.tcl, and that's why I still consider page parameters and properties, including the names of columns in a multirow proprety, part of the page's public API.

It seems like we disagree on this point. Fair enough.

5) It's not a non-core change, because since I'd run into this problem before while following the coding style which I  prefer and thought was fairly standard, I figured it was a general problem and I believe these are best solved with general solutions.

6) Yes, the logger page I am working on is going to become part of the newly released logger pacakge, but obviously if people (and yes, I'd like to hear the opinions of more people) are opposed to adding -unclobber, I'll have to rewrite things a bit.

7) Please don't take your general frustration out on me. I haven't committed anything. I posted to ask for people's opinion. I didn't manage to get opinions from many different people, but I did get your opinion quite clearly, thank you. :)

/Lars

Collapse
Posted by Tom Jackson on

An interesting point is that db_multirow, without a foreach block, doesn't clobber the local vars. This means the original intent of db_multirow was to not create these vars in the local context. I believe that the foreach code block was added recently to make it possible to do per row processing. Given that the new db_multirow combines features of the old nonclobbering db_multirow and the clobbering db_foreach, I can see the reason behind your original proposal as it applies to db_multirow.

For db_multirow, the question of what should be the default (-clobber/--unclobber) is still important. If you look at ad_page_contract properties, the multirow is usually only specified as the array name, not the individual column names. If you add a foreach block to the db_multirow for the purpose of, for example, adding a calculated column to each row, your intent is not to have to deal with all these variables that now exist, and possibly have overwritten, your local variables. I'm almost to the point of thinking that -unclobber should be the default, so that things match up with the original db_multirow behavior.

A similar intent could be applied to db_foreach. Many times I have used db_foreach for a purpose not related to setting local variables, and never for setting the variables that are reset for each iteration of the loop.

Lars, I'm sorry that I misintrepreted your reasons for the changes.