Created by Anett Szabo, last modified by Gustaf Neumann 13 Feb 2009, at 09:37 AM
Declarative languages can be very powerful and reliable, but sometimes
it is easier to think about things procedurally. One way to do this is
by using a procedural language in the database client. For example,
with AOLserver we generally program in Tcl, a procedural language, and
read the results of SQL queries. For example, in the /news module of
the ArsDigita Community System, we want to
- query for the current news
- loop through the rows that come back and display one line for each
row (with a link to a page that will show the full story)
- for the first three rows, see if the news story is very short. If
so, just display it on this page
The words above that should give a SQL programmer pause are in the last
bullet item:
if and
for the first three rows. There are
no clean ways in standard SQL to say "do this just for the first N rows"
or "do something special for a particular row if its data match a
certain pattern".
Here's the AOLserver Tcl program. Note that depending on the contents
of an item in the news
table, the Tcl program may execute
an SQL query (to figure out if there are user comments on a short news
item).
set selection [ns_db select $db "select *
from news
where sysdate between release_date and expiration_date
and approved_p = 't'
order by release_date desc, creation_date desc"]
while { [ns_db getrow $db $selection] } {
set_variables_after_query
# we use the luxury of Tcl to format the date nicely
ns_write "<li>[util_AnsiDatetoPrettyDate $release_date]: "
if { $counter <= 3 && [string length $body] < 300 } {
# it is one of the top three items and it is rather short
# so, let's consider displaying it right here
# first, let's go back to Oracle to find out if there are any
# comments on this item
set n_comments [database_to_tcl_string $db_sub "select count(*) from general_comments where on_what_id = $news_id and on_which_table = 'news'"]
if { $n_comments > 0 } {
# there are some comments; just show the title
ns_write "<a href=\"item.tcl?news_id=$news_id\">$title</a>\n"
} else {
# let's show the whole news item
ns_write "$title\n<blockquote>\n[util_maybe_convert_to_html $body $html_p]\n"
if {[parameter::get -parameter SolicitCommentsP -default 1]} {
ns_write "<br><br>\n<A HREF=\"comment-add.tcl?news_id=$news_id\">comment</a>\n"
}
ns_write "</blockquote>\n"
}
} else {
ns_write "<a href=\"item.tcl?news_id=$news_id\">$title</a>\n"
}
}
Suppose that you have a million rows in your news table, you want five,
but you can only figure out which five with a bit of procedural logic.
Does it really make sense to drag those million rows of data all the way
across the network from the database server to your client application
and then throw out 999,995 rows?
Or suppose that you're querying a million-row table and want the results
back in a strange order. Does it make sense to build a million-row data
structure in your client application, sort them in the client program,
then return the sorted rows to the user?
Visit http://www.scorecard.org/chemical-profiles/
and search for "benzene". Note that there are 328 chemicals whose names
contain the string "benzene":
select count(*)
from chemical
where upper(edf_chem_name) like upper('%benzene%');
COUNT(*)
----------
328
The way we want to display them is
- exact matches on top
- line break
- chemicals that start with the query string
- line break
- chemicals that contain the query string
Within each category of chemicals, we want to sort alphabetically.
However, if there are numbers or special characters in front of a
chemical name, we want to ignore those for the purposes of sorting.
Can you do all of that with one query? And have them come back from the
database in the desired order?
You could if you could make a procedure that would run inside the
database. For each row, the procedure would compute a score reflecting
goodness of match. To get the order correct, you need only ORDER BY
this score. To get the line breaks right, you need only have your
application program watch for changes in score. For the fine tuning of
sorting equally scored matches alphabetically, just write another
procedure that will return a chemical name stripped of leading special
characters, then sort by the result. Here's how it looks:
select edf_chem_name,
edf_substance_id,
score_chem_name_match_score(upper(edf_chem_name),upper('%benzene%'))
as match_score
from chemical
where upper(edf_chem_name) like upper('%benzene%');
order by score_chem_name_match_score(upper(edf_chem_name),upper('benzene')),
score_chem_name_for_sorting(edf_chem_name)
We specify the procedure score_chem_name_match_score
to
take two arguments: one the chemical name from the current row, and one
the query string from the user. It returns 0 for an exact match, 1 for
a chemical whose name begins with the query string, and 2 in all other
cases (remember that this is only used in queries where a LIKE clause
ensures that every chemical name at least contains the query string.
Once we defined this procedure, we'd be able to call it from a SQL
query, the same way that we can call built-in SQL functions such as
upper
.
So is this possible? Yes, in all "enterprise-class" relational database
management systems. Historically, each DBMS has had a proprietary
language for these stored procedures. Starting in 1997, DBMS
companies began to put Java byte-code interpreters into the database
server. Oracle added Java-in-the-server capability with its 8.1
release in February 1999. If you're looking at old systems such as
Scorecard, though, you'll be looking at procedures in Oracle's venerable
PL/SQL language:
create or replace function score_chem_name_match_score
(chem_name IN varchar, query_string IN varchar)
return integer
AS
BEGIN
IF chem_name = query_string THEN
return 0;
ELSIF instr(chem_name,query_string) = 1 THEN
return 1;
ELSE
return 2;
END IF;
END score_chem_name_match_score;
Notice that PL/SQL is a strongly typed language. We say what arguments
we expect, whether they are IN or OUT, and what types they must be. We
say that
score_chem_name_match_score
will return an
integer. We can say that a PL/SQL variable should be of the same type
as a column in a table:
create or replace function score_chem_name_for_sorting (chem_name IN varchar)
return varchar
AS
stripped_chem_name chem_hazid_ref.edf_chem_name%TYPE;
BEGIN
stripped_chem_name := ltrim(chem_name,'1234567890-+()[],'' #');
return stripped_chem_name;
END score_chem_name_for_sorting;
The local variable
stripped_chem_name
is going to be the
same type as the
edf_chem_name
column in the
chem_hazid_ref
table.
If you are using the Oracle application SQL*Plus to define PL/SQL
functions, you have to terminate each definition with a line containing
only the character "/". If SQL*Plus reports that there were errors in
evaluating your definition, you then have to type "show errors" if you
want more explanation. Unless you expect to write perfect code all the
time, it can be convenient to leave these SQL*Plus incantations in your
.sql files. Here's an example:
-- note that we prefix the incoming arg with v_ to keep it
-- distinguishable from the database column of the same name
-- this is a common PL/SQL convention
create or replace function user_group_name_from_id (v_group_id IN integer)
return varchar
IS
-- instead of worrying about how many characters to
-- allocate for this local variable, we just tell
-- Oracle "make it the same type as the group_name
-- column in the user_groups table"
v_group_name user_groups.group_name%TYPE;
BEGIN
if v_group_id is null
then return '';
end if;
-- note the usage of INTO below, which pulls a column
-- from the table into a local variable
select group_name into v_group_name
from user_groups
where group_id = v_group_id;
return v_group_name;
END;
/
show errors
Choosing between PL/SQL and Java
How to choose between PL/SQL and Java? Easy: you don't get to choose.
In lots of important places, e.g., triggers, Oracle forces you to
specify blocks of PL/SQL. So you have to learn at least the rudiments
of PL/SQL. If you're going to build major packages, Java is probably a
better long-term choice.
Reference
---
based on SQL for Web Nerds
Created by Anett Szabo, last modified by Gustaf Neumann 13 Feb 2009, at 09:27 AM
Control structures let you say "run this fragment of code if X is true" or "do this a few times" or "do this until something is no longer true". The available control structures in Tcl may be grouped into the following categories:
- conditional
- looping (iteration)
- error-handling
- miscellaneous (non-local exit)
The Fundamental Conditional Command: if
The most basic Tcl control structure is the if
command:
if boolean ?then? body1 ?else? ?body2?
Note that the words "then" and "else" are optional, as is the entire else clause. The most basic
if
statement looks like this:
if {condition} {
body
}
In the ArsDigita Community System, we always leave out the "then", but if we include an
else
or
elseif
clause, we put in those optional words. Consistency is the hobgoblin of little minds...
if {condition} {
body
} elseif {other_condition} {
alternate_body
} else {
another_body
}
Note how the curly braces and keywords are artfully positioned so that the entire
if
statement is on one line as far as the interpreter is concerned, i.e., all the newlines are grouped within curly braces. An easy way to break your program is to rewrite the above statement as follows:
if {condition} {
body
} elseif {other_condition} {
alternate_body
} else {
another_body
}
The Tcl interpreter will think that the
if
statement has ended after the first body and will next try to evaluate "elseif" as a procedure.
Let's look at an example from http://software.arsdigita.com/www/register/user-login.tcl. At this point in the ArsDigita Community System flow, a user has already typed his or her email address.
# Get the user ID
set selection [ns_db 0or1row $db "select user_id, user_state, converted_p from users \
where upper(email)=upper('$QQemail')"]
if {$selection == ""} {
# Oracle didn't find a row; this email addres is not in the database
# redirect this person to the new user registration page
ns_returnredirect "user-new.tcl?[export_url_vars return_url email]"
return
}
The same page provides an example both of nested
if
and
if then else
:
if {[parameter::get -parameter AllowPersistentLoginP -default 1]} {
# publisher has elected to provide an option to issue
# a persistent cookie with user_id and crypted password
if {[parameter::get -parameter PersistentLoginDefaultP -default 1]} {
# persistent cookie shoudl be the default
set checked_option "CHECKED"
} else {
set checked_option ""
}
ns_write "<input type=checkbox name=persistent_cookie_p value=t $checked_option>
Remember this address and password?
(<a href=\"explain-persistent-cookies.adp\">help</a>)"
}
Notice that the conventional programming style in Tcl is to call
if
for effect rather than value. It would work just as well to write the inner
if
in a more Lisp-y style:
set checked_option [if {[parameter::get ...]} {
subst "CHECKED"
} else {
subst ""
}]
This works because
if
returns the value of the last expression evaluated. However, being correct and being comprehensible to the community of Tcl programmers are different things. It is best to write code adhering to indentation and other stylistic conventions. You don't want to be the only person in the world capable of maintaining a service that has to be up 24x7.
Another Conditional Command: switch
The switch
dispatches on the value of its first argument: particular variable as follows:
switch flags value {
pattern1 body1
pattern2 body2
...
}
If
http://software.arsdigita.com/www/register/user-login.tcl finds a user in the database, it uses a
switch
on the user's state to decide what to do next:
switch $user_state {
"authorized" { # just move on }
"banned" {
ns_returnredirect "banned-user.tcl?user_id=$user_id"
return
}
"deleted" {
ns_returnredirect "deleted-user.tcl?user_id=$user_id"
return
}
"need_email_verification_and_admin_approv" {
ns_returnredirect "awaiting-email-verification.tcl?user_id=$user_id"
return
}
"need_admin_approv" {
ns_returnredirect "awaiting-approval.tcl?user_id=$user_id"
return
}
"need_email_verification" {
ns_returnredirect "awaiting-email-verification.tcl?user_id=$user_id"
return
}
"rejected" {
ns_returnredirect "awaiting-approval.tcl?user_id=$user_id"
return
}
default {
ns_log Warning "Problem with registration state machine on user-login.tcl"
ad_return_error "Problem with login" "There was a problem authenticating the account: $user_id. Most likely, the database contains users with no user_state."
return
}
}
In this case, we're using the standard
switch
behavior of matching strings exactly. We're also provide a "default" keyword at the end that indicates some code to run if nothing else matched.
It is possible to use more sophisticated patterns in switch
. Here's a fragment that sends different email depending on the pattern of the address:
switch -glob $email {
{*mit.edu} { ns_sendmail $email $from $subject $body }
{*cmu.edu} { ns_sendmail $email $from $subject "$body\n\nP.S. Consider applying to MIT. Boston is much nicer than Pittsburgh"}
{*harvard.edu} { ns_sendmail $email $from $subject "$body\n\nP.S. Please ask your parents to invest in our tech startup."}
}
The third behavior for switch
is invoked using the "-regexp" flag. See the pattern matching chapter for more on how these patterns work.
More: http://www.tcl.tk/man/tcl8.4/TclCmd/switch.htm
Looping commands while
, foreach
, and for
The while
command in Tcl operates as follows:
while { conditional_statement } {
loop_body_statements
}
The conditional statement is evaluated; if it is true, the loop body statement is executed, and then the conditional statement is reevaluated and the process repeats. If the conditional statement is ever false, the interpreter does not execute the loop body statements, and continues to the next line after the conditional.
Here is a while
statement used to display the last name, first name of each MIT nerd using a Web service. The conditional is the result of calling AOLserver's ns_db getrow
API procedure. This procedure returns 1 if it can fetch the next row from the SQL cursor, 0 if there aren't any more rows to fetch.
set selection [ns_db select $db "select first_names, last_name from users \
where lower(email) like '%mit.edu'"]
while { [ns_db getrow $db $selection] } {
# set local variables to the SQL column names
set_variables_after_query
ns_write "<LI>$last_name, $first_names"
}
More: http://www.tcl.tk/man/tcl8.4/TclCmd/while.htm
The Tcl foreach
command loops through the elements of a list, setting a loop variable to each element in term:
foreach variable_name list {
body
}
Here's an example from
http://software.arsdigita.com/www/monitor.tcl, a page that displays current server activity:
# ask AOLserver to return a list of lists, one for each current connection
set connections [ns_server active]
foreach connection $connections {
# separate the sublist elements with "
" tags
ns_write $conn "
[join $connection "
"]"
}
The program
http://sofware.arsdigita.com/www/admin/static/link-check.tcl checks every HTML file in an ArsDigita Community System for dead links. Here's a helper procedure that works on one file:
proc check_file {f} {
# alert the administrator that we're working on this file
ns_write "<li>$f\n<ul>\n"
# read the contents into $content
set stream [open $f]
set content [read $stream]
close $stream
# loop through each reference, relying on API call ns_hrefs
# to parse the HTML and tell us where this file points
foreach url [ns_hrefs $content] {
# do all the hard work
...
}
ns_write "</ul>\n"
}
Notice how easy this procedure was to write thanks to the AOLserver developers thoughtfully providing us with ns_hrefs
, which takes an HTML string and returns a list of every HREF target.
More: http://www.tcl.tk/man/tcl8.4/TclCmd/foreach.htm
The last looping command, for
, is good for traditional "for i from 1 to 10" kind of iteration. Here's the syntax:
for start test next body
We use this control structure in the winner picking admin page of the ArsDigita Comunity System's contest module:
http://software.arsdigita.com/www/admin/contest/pick-winners.tcl. The input to this page specifies a time period, a contest, and how many winners are to be picked. Here the result of executing the
for
loop is a list of N elements, where N is the number of desired winners:
for {set i 1} {$i <= $n_winners} {incr i} {
# we'll have winner_numbers between 0 and $n_contestants - 1
# because randomRange returns a random integer between 0
# and its argument
lappend winner_numbers [randomRange $n_contestants]
}
More: http://www.tcl.tk/man/tcl8.4/TclCmd/for.htm
Error-handling command: catch
If a Tcl command throws an error in a CGI script or an AOLserver API page, by default the user will be presented with an error page. If you don't want that to happen, fix your bugs! Sometimes it isn't possible to fix your bugs. For example, the ns_httpget
API procedure fetches a Web page from the wider Internet. Under certain network-dependent conditions, it may throw an error. If you don't want your users to be exposed to that as an error, put in a catch:
catch script ?variable_name?
catch
returns 1 if
script
threw an error, 0 otherwise. If you supply the second argument (
variable_name
),
catch
will set that variable to the result of executing
script
, whether or not the script threw an error.
Our classic example always involves ns_httpget. Here's one from http://www.webho.com/WealthClock:
# define a procedure that computes the entire page
proc wealth_ReturnWholePage {} {
# do a couple of ns_httpgets and some arithmetic
# to produce the user-visible HTML
...
}
# this is the procedure registered to http://www.webho.com/WealthClock
proc wealth_Top {ignore} {
if [catch {set moby_string [Memoize wealth_ReturnWholePage]} errmsg] {
# something went wrong with our sources
... return an apology message to the users
} else {
# we computed the result (or Memoize got it from the cache)
ns_return 200 text/html $moby_string
}
}
Sending email is another time that a Web server has to go outside its carefully controlled world and might experience an error. Here is the entire http://software.arsdigita.com/tcl/ad-monitor.tcl, which implements a central facility for other sections in the ArsDigita Community System. The idea is that programmers can put in "email the administrator if broken" instructions on pages that won't result in a nightmare for the administrator if the page is getting hit every few seconds.
# the overall goal here is that the ad_host_administrator gets
# notified if something is horribly wrong, but not more than once
# every 15 minutes
# we store the last [ns_time] (seconds since 1970) notification time
# in ad_host_administrator_last_notified
ns_share -init { set ad_host_administrator_last_notified 0 } ad_host_administrator_last_notified
proc ad_notify_host_administrator {subject body {log_p 0}} {
ns_share ad_host_administrator_last_notified
if $log_p {
# usually the error will be in the error log anyway
ns_log Notice "ad_notify_host_administrator: $subject\n\n$body\n\n"
}
if { [ns_time] > [expr $ad_host_administrator_last_notified + 900] } {
# more than 15 minutes have elapsed since last note
set ad_notify_host_administrator [ns_time]
if [catch { ns_sendmail [ad_host_administrator] [ad_system_owner] $subject $body } errmsg] {
ns_log Error "failed sending email note to [ad_host_administrator]"
}
}
}
Make sure that you don't overuse catch. The last thing that you want is a page failing silently. Genuine errors should always be brought to a user's attention and ideally to the site administrator's. Users should not think that a server has done something on their behalf when in fact the task was not accomplished.
More: http://www.tcl.tk/man/tcl8.4/TclCmd/catch.htm
Miscellaneous commands: break
, continue
, return
, and error
When inside a looping command, it is sometimes desirable to get the command to stop looping or to stop executing the current iteration but to continue on the next one. The break
command is used to permanently escape the loop; the continue
command is used to escape the current iteration of the loop but to start again at the next iteration. The syntax for each consists only of the appropriate word written on a line by itself within a loop.
We often use the break
command when we want to limit the number of rows to display from the database. Here's an example from the photo.net neighbor-to-neighbor system. By default, we only want to show a "reasonable" number of postings on one page:
set selection [ns_db select $db ... big SQL query ... ]
set list_items ""
# see what the publisher thinks is a reasonable number (default to 100)
set n_reasonable [parameter::get -parameter NReasonablePostings -default 100]
# initialize a counter of the number of rows displayed so far
set counter 0
while {[ns_db getrow $db $selection]} {
set_variables_after_query
incr counter
if { $counter > $n_reasonable) } {
# append ellipses
append list_items "<p>\n..."
# flush the database cursor (tell Oracle that we don't need the
# rest of the rows)
ns_db flush $db
# break out of the loop
break
}
append list_items "<li><a href=\"view-one.tcl ..."
}
More: http://www.tcl.tk/man/tcl8.4/TclCmd/break.htm
The return
command has been shown before. It quits the proc it's in and returns the supplied value. Remember that any procedure lines after return
aren't executed. Too many times we've seen code of the following form:
proc a_new_programmers_proc {} {
set db [ns_db gethandle]
# do a bunch of stuff with the database
return $result
# release the database handle
ns_db releasehandle $db
}
The most interesting thing that you can do with return is write procedures that force their callers to return as well. Here's an example from
http://software.arsdigita.com/tcl/ad-security.tcl:
proc ad_maybe_redirect_for_registration {} {
if { [ad_verify_and_get_user_id] != 0 } {
# user is in fact logged in, return happiness
return
} else {
ns_returnredirect "/register/index.tcl?return_url=[ns_urlencode [ns_conn url]$url_args]"
# blow out of 2 levels
return -code return
}
}
A .tcl page can simply call this in-line
ad_maybe_redirect_for_registration
# the code below will never get executed if the user isn't registered
# ... update the database or whatever ...
More: http://www.tcl.tk/man/tcl8.4/TclCmd/return.htm
The error
command returns from a proc and and raises an error that, if not caught by a catch
statement, will result in the user seeing a server error page. The first argument to error
is displayed in the debugging backtrace:
proc divide {x y} {
if {$y == 0} {
error "Can't divide by zero."
} else {
return [expr {$x / $y}]
}
}
More: http://www.tcl.tk/man/tcl8.4/TclCmd/error.htm
---
based on Tcl for Web Nerds