One of OpenACS's great strengths is that code written for it is very close to the database. It is very easy to interact with the database from anywhere within OpenACS. Our goal is to develop a coherent API for database access which makes this even easier.
There were four significant problems with the way OpenACS previously used the database (i.e., directly through the ns_db
interface):
-
Handle management. We required code to pass database handles around, and for routines which needed to perform database access but didn't receive a database handle as input, it was difficult to know from which of the three "magic pools" (main, subquery, and log) to allocate a new handle.
-
Nested transactions. In our Oracle driver, begin transaction
really means "turn auto-commit mode off" and end transaction
means "commit the current transaction and turn auto-commit mode on." Thus if transactional code needed to call a routine which needed to operate transactionally, the semantics were non-obvious. Consider:
proc foo { db args } {
db_transaction {
...
}
}
db_transaction {
db_dml unused "insert into greeble(bork) values(33)"
foo $db
db_dml unused "insert into greeble(bork) values(50)"
}
This would insert greeble #33 and do all the stuff in foo
transactionally, but the end transaction
in foo
would actually cause a commit, and greeble #50 would later be inserted in auto-commit mode. This could cause subtle bugs: e.g., in the case that the insert for greeble #50 failed, part of the "transaction" would have already have been committed!. This is not a good thing.
-
Unorthodox use of variables. The standard mechanism for mapping column values into variables involved the use of the set_variables_after_query
routine, which relies on an uplevel variable named selection
(likewise for set_variables_after_subquery
and subselection
).
-
Hard-coded reliance on Oracle. It's difficult to write code supporting various different databases (dynamically using the appropriate dialect based on the type of database being used, e.g., using DECODE
on Oracle and CASE ... WHEN
on Postgres).
The Database Access API addresses the first three problems by:
-
making use of database handles transparent
-
wrapping common database operations (including transaction management) in Tcl control structures (this is, after all, what Tcl is good at!)
It lays the groundwork for addressing the fourth problem by assigning each SQL statement a logical name. In a future version of the OpenACS Core, this API will translate logical statement names into actual SQL, based on the type of database in use. (To smooth the learning curve, we provide a facility for writing SQL inline for a "default SQL dialect", which we assume to be Oracle for now.)
To be clear, SQL abstraction is not fully implemented in OpenACS 3.3.1. The statement names supplied to each call are not used by the API at all. The API's design for SQL abstraction is in fact incomplete; unresolved issues include:
-
how to add WHERE
clause criteria dynamically
-
how to build a dynamic ORDER BY
clause (Ben Adida has a proposed solution for this)
-
how to define a statement's formal interface (i.e., what bind variables it expects, what columns its SELECT
clause must contain if it's a query) without actually implementing the statement in a specific SQL dialect
So why is the incremental change of adding statement naming to the API worth the effort? It is worth the effort because we know that giving each SQL statement a logical name will be required by the complete SQL abstraction design. Therefore, we know that the effort will not be wasted, and taking advantage of the new support for bind variables will already require code that uses 3.3.0 version of the API to be updated.
set_variables_after_query
is gone! (Well, it's still there, but you'll never need to use it.) The new API routines set local variables automatically. For instance:
db_1row select_names "select first_names, last_name from users where user_id = [ad_get_user_id]"
doc_body_append "Hello, $first_names $last_name!"
Like ns_db 1row
, this will bomb if the query doesn't return any rows (no such user exists). If this isn't what you want, you can write:
if { [db_0or1row select_names "select first_names, last_name from users where user_id = [ad_get_user_id]"] } {
doc_body_append "Hello, $first_names $last_name!"
} else {
# Executed if the query returns no rows.
doc_body_append "There's no such user!"
}
Selecting a bunch of rows is a lot prettier now:
db_foreach select_names "select first_names, last_name from users" {
doc_body_append "Say hi to $first_names $last_name for me!<br>"
}
That's right, db_foreach
is now like ns_db select
plus a while
loop plus set_variables_after_query
plus an if
statement (containing code to be executed if no rows are returned).
db_foreach select_names "select first_names, last_name from users where last_name like 'S%'" {
doc_body_append "Say hi to $first_names $last_name for me!<br>"
} if_no_rows {
doc_body_append "There aren't any users with last names beginnings with S!"
}
Introduction
Most SQL statements require that the code invoking the statement pass along data associated with that statement, usually obtained from the user. For instance, in order to delete a WimpyPoint presentation, a Tcl script might use the SQL statement
delete from wp_presentations where presentation_id = some_presentation_id
where some_presentation_id
is a number which is a valid presentation ID of the presentation I want to delete. It's easy to write code handling situations like this since SQL statements can include bind variables, which represent placeholders for actual data. A bind variable is specified as a colon followed by an identifier, so the statement above can be coded as:
db_dml presentation_delete {
delete from wp_presentations where presentation_id = :some_presentation_id
}
When this SQL statement is invoked, the value for the bind variable :some_presentation_id
is pulled from the Tcl variable $some_presentation_id
(in the caller's environment). Note that bind variables are not limited to one per statement; you can use an arbitrary number, and each will pull from the correspondingly named Tcl variable. (Alternatively, you can also specify an list or ns_set
providing bind variables' values; see Usage.)
The value of a bind variable is taken literally by the database driver, so there is never any need to put single-quotes around the value for a bind variable, or to use db_quote
to escape single-quotes contained in the value. The following works fine, despite the apostrophe:
set exclamation "That's all, folks!"
db_dml exclamation_insert { insert into exclamations(exclamation) values(:exclamation) }
Note that you can use a bind variable in a SQL statement only where you could use a literal (a number or single-quoted string). Bind variables cannot be placeholders for things like SQL keywords, table names, or column names, so the following will not work, even if $table_name
is set properly:
select * from :table_name
Why Bind Variables Are Useful
Why bother with bind variables at all - why not just write the Tcl statement above like this:
db_dml presentation_delete "
delete from wp_presentations where presentation_id = $some_presentation_id
"
(Note the use of double-quotes to allow the variable reference to $some_presentation_id
to be interpolated in.) This will work, but consider the case where some devious user causes some_presentation_id
to be set to something like '3 or 1 = 1'
, which would result in the following statement being executed:
delete from wp_presentations where presentation_id = 3 or 1 = 1
This deletes every presentation in the database! Using bind variables eliminates this gaping security hole: since bind variable values are taken literally. Oracle will attempt to delete presentations whose presentation ID is literally '3 or 1 = 1'
(i.e., no presentations, since '3 or 1 = 1'
can't possibly be a valid integer primary key for wp_presentations
. In general, since Oracle always considers the values of bind variables to be literals, it becomes more difficult for users to perform URL surgery to trick scripts into running dangerous queries and DML.
Usage
Every db_*
command accepting a SQL command as an argument supports bind variables. You can either
-
specify the -bind
switch to provide a set with bind variable values, or
-
specify the -bind
switch to explicitly provide a list of bind variable names and values, or
-
not specify a bind variable list at all, in which case Tcl variables are used as bind variables.
The default behavior (i.e., if the -bind
switch is omitted) is that these procedures expect to find local variables that correspond in name to the referenced bind variables, e.g.:
set user_id 123456
set role "administrator"
db_foreach user_group_memberships_by_role {
select g.group_id, g.group_name
from user_groups g, user_group_map map
where g.group_id = map.user_id
and map.user_id = :user_id
and map.role = :role
} {
# do something for each group of which user 123456 is in the role
# of "administrator"
}
The value of the local Tcl variable user_id
(123456) is bound to the user_id
bind variable.
The -bind
switch can takes the name of an ns_set
containing keys for each bind variable named in the query, e.g.:
set bind_vars [ns_set create]
ns_set put $bind_vars user_id 123456
ns_set put $bind_vars role "administrator"
db_foreach user_group_memberships_by_role {
select g.group_id, g.group_name
from user_groups g, user_group_map map
where g.group_id = map.user_id
and map.user_id = :user_id
and map.role = :role
} -bind $bind_vars {
# do something for each group in which user 123456 has the role
# of "administrator"
}
Alternatively, as an argument to -bind
you can specify a list of alternating name/value pairs for bind variables:
db_foreach user_group_memberships_by_role {
select g.group_id, g.group_name
from user_groups g, user_group_map map
where g.group_id = map.user_id
and map.user_id = :user_id
and map.role = :role
} -bind [list user_id 123456 role "administrator"] {
# do something for each group in which user 123456 has the role
# of "administrator"
}
Nulls and Bind Variables
When processing a DML statement, Oracle coerces empty strings into null
. (This coercion does not occur in the WHERE
clause of a query, i.e. col = ''
and col is null
are not equivalent.)
As a result, when using bind variables, the only way to make Oracle set a column value to null
is to set the corresponding bind variable to the empty string, since a bind variable whose value is the string "null" will be interpreted as the literal string "null".
These Oracle quirks complicate the process of writing clear and abstract DML difficult. Here is an example that illustrates why:
#
# Given the table:
#
# create table foo (
# bar integer,
# baz varchar(10)
# );
#
set bar ""
set baz ""
db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)"
#
# the values of the "bar" and "baz" columns in the new row are both
# null, because Oracle has coerced the empty string (even for the
# numeric column "bar") into null in both cases
Since databases other than Oracle do not coerce empty strings into null
, this code has different semantics depending on the underlying database (i.e., the row that gets inserted may not have null as its column values), which defeats the purpose of SQL abstraction.
Therefore, the Database Access API provides a database-independent way to represent null
(instead of the Oracle-specific idiom of the empty string): db_null
.
Use it instead of the empty string whenever you want to set a column value explicitly to null
, e.g.:
set bar [db_null]
set baz [db_null]
db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)"
#
# sets the values for both the "bar" and "baz" columns to null
Normally, db_foreach
, db_0or1row
, and db_1row
places the results of queries in Tcl variables, so you can say:
db_foreach users_select "select first_names, last_name from users" {
doc_body_append "<li>$first_names $last_name\n"
}
However, sometimes this is not sufficient: you may need to examine the rows returned, to dynamically determine the set of columns returned by the query, or to avoid collisions with existing variables. You can use the -column_array
and -column_set
switches to db_foreach
, db_0or1row
, and db_1row
to instruct the database routines to place the results in a Tcl array or ns_set
, respectively, where the keys are the column names and the values are the column values. For example:
db_foreach users_select "select first_names, last_name from users" -column_set columns {
# Now $columns is an ns_set.
doc_body_append "<li>"
for { set i 0 } { $i < [ns_set size $columns] } { incr i } {
doc_body_append "[ns_set key $columns $i] is [ns_set value $columns $i]. \n"
}
}
will write something like:
-
first_names is Jon. last_name is Salz.
-
first_names is Lars. last_name is Pind.
-
first_names is Michael. last_name is Yoon.