programming-with-aolserver.xml

Delivered as text/xml
[ hide source ] | [ make this the default ]
File Contents

<?xml version='1.0' ?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
               "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
<!ENTITY % myvars SYSTEM "../variables.ent">
%myvars;
]>
<sect1 id="programming-with-aolserver" xreflabel="Programming with AOLserver">
<title>Programming with AOLserver</title>


<authorblurb>
<para>By Michael Yoon, Jon Salz and Lars Pind.</para>
</authorblurb>

<sect2 id="programming-aolserver-global">
<title>The <computeroutput>global</computeroutput> command</title>

<para>
When using AOLserver, remember that there are effectively <emphasis>two</emphasis> types
of global namespace, not one: 
</para>

<orderedlist>
<listitem><para><emphasis>Server</emphasis>-global: As you&#39;d expect, there is
only one server-global namespace per server, and variables set within it can
be accessed by any Tcl code running subsequently, in any of the server&#39;s
threads. To set/get server-global variables, use AOLserver 3&#39;s <ulink url="http://www.aolserver.com/docs/nsv.adp"><computeroutput>nsv</computeroutput> API</ulink>
(which supersedes <computeroutput>ns_share</computeroutput> from the pre-3.0 API). 
</para></listitem>

<listitem><para><emphasis>Script</emphasis>-global: Each Tcl script (ADP, Tcl page,
registered proc, filter, etc.) executing within an AOLserver thread has its
own global namespace. Any variable set in the top level of a script is, by
definition, script-global, meaning that it is accessible only by subsequent
code in the same script and only for the duration of the current script
execution.</para></listitem>
</orderedlist>

<para>
The Tcl built-in command <ulink url="http://aolserver.com/docs/tcl/tcl8.3/TclCmd/global.htm"><computeroutput>global</computeroutput></ulink>
accesses script-global, <emphasis>not</emphasis> server-global, variables from within a
procedure. This distinction is important to understand in order to use
<computeroutput>global</computeroutput> correctly when programming AOLserver. 
</para>

<para>Also, AOLserver purges all script-global variables in a thread (i.e., Tcl
interpreter) between HTTP requests. If it didn&#39;t, that would affect (and
complicate) our use of script-global variables dramatically, which would then
be better described as <emphasis>thread</emphasis>-global variables. Given
AOLserver&#39;s behavior, however, &quot;script-global&quot; is a more
appropriate term.</para>

</sect2>

<sect2 id="programming-aolserver-sched-procs">
<title>Threads and Scheduled Procedures</title>

<para>
<computeroutput>ns_schedule_proc</computeroutput> and <computeroutput>ad_schedule_proc</computeroutput> each take a
<computeroutput>-thread</computeroutput> flag to cause a scheduled procedure to run
asynchronously, in its own thread. It almost always seems like a good idea to
specify this switch, but there&#39;s a problem. 
</para>

<para>It turns out that whenever a task scheduled with <computeroutput>ns_schedule_proc
-thread</computeroutput> or <computeroutput>ad_schedule_proc -thread t</computeroutput> is run, AOLserver
creates a brand new thread and a brand new interpreter, and reinitializes the
procedure table (essentially, loads all procedures that were created during
server initialization into the new interpreter). This happens <emphasis>every
time</emphasis> the task is executed - and it is a very expensive process that
should not be taken lightly!</para>

<para>The moral: if you have a lightweight scheduled procedure
which runs frequently, don&#39;t use the <computeroutput>-thread</computeroutput>
switch.</para>



<blockquote><para><emphasis>Note also that thread is initialized with a copy of what was
installed during server startup, so if the procedure table have changed since
startup (e.g. using the <link linkend="apm-design">APM</link> watch
facility), that will not be reflected in the scheduled
thread.</emphasis></para></blockquote>

</sect2>

<sect2 id="programming-aolserver-return">
<title>Using <computeroutput>return</computeroutput></title>

<para>
The <computeroutput>return</computeroutput> command in Tcl returns control to the caller procedure.
This definition allows nested procedures to work properly. However, this
definition also means that nested procedures cannot use <computeroutput>return</computeroutput> to
end an entire thread. This situation is most common in exception conditions
that can be triggered from inside a procedure e.g., a permission denied
exception. At this point, the procedure that detects invalid permission wants
to write an error message to the user, and completely abort execution of the
caller thread. <computeroutput>return</computeroutput> doesn&#39;t work, because the procedure may be
nested several levels deep. We therefore use <ulink url="/api-doc/proc-view?proc=ad%5fscript%5fabort"><computeroutput>ad_script_abort</computeroutput></ulink>
to abort the remainder of the thread. Note that using <computeroutput>return</computeroutput> instead
of <computeroutput>ad_script_abort</computeroutput> may raise some security issues: an attacker could
call a page that performed some DML statement, pass in some arguments, and
get a permission denied error -- but the DML statement would still be
executed because the thread was not stopped. Note that <computeroutput>return -code
return</computeroutput> can be used in circumstances where the procedure will only be
called from two levels deep. 
</para>

</sect2>

<sect2 id="programming-aolserver-more-values">
<title>Returning More Than One Value From a Function</title>

<para>
Many functions have a single return value. For instance, <ulink url="/api-doc/proc-view?proc=util_email_valid_p"><computeroutput>util_email_valid_p</computeroutput></ulink>
returns a number: 1 or 0. Other functions need to return a composite value.
For instance, consider a function that looks up a user&#39;s name and email
address, given an ID. One way to implement this is to return a three-element
list and document that the first element contains the name, and the second
contains the email address. The problem with this technique is that, because
Tcl does not support constants, calling procedures that returns lists in this
way necessitates the use of magic numbers, e.g.: 
</para>
 

<programlisting>
set user_info [ad_get_user_info $user_id]
set first_name [lindex $user_info 0]
set email [lindex $user_info 1]
</programlisting>


<para>AOLserver/Tcl generally has three mechanisms that we like, for returning
more than one value from a function. When to use which depends on the
circumstances.</para>

<para>Using Arrays and Pass-By-Value</para>

<para>
The one we generally prefer is returning an <ulink url="http://aolserver.com/docs/tcl/tcl8.3/TclCmd/array.htm#M8"><computeroutput>array
get</computeroutput></ulink>-formatted list. It has all the nice properties of
pass-by-value, and it uses Tcl arrays, which have good native support. 
</para>
 

<programlisting>
ad_proc ad_get_user_info { user_id } {
    db_1row user_info { select first_names, last_name, email from users where user_id = :user_id }
    return [list \
        name &quot;$first_names $last_name&quot; \
    email $email \
    namelink &quot;&lt;a href=\&quot;/shared/community-member?user_id=[ns_urlencode $user_id]\&quot;&gt;$first_names $last_name&lt;/a&gt;&quot; \
    emaillink &quot;&lt;a href=\&quot;mailto:$email\&quot;&gt;$email&lt;/a&gt;&quot;]
}

array set user_info [ad_get_user_info $user_id]

doc_body_append &quot;$user_info(namelink) ($user_info(emaillink))&quot;
</programlisting>

<para>
You could also have done this by using an array internally and using
<computeroutput>array get</computeroutput>: 
</para>
 

<programlisting>

ad_proc ad_get_user_info { user_id } {
    db_1row user_info { select first_names, last_name, email from users where user_id = :user_id }
    set user_info(name) &quot;$first_names $last_name&quot;
    set user_info(email) $email
    set user_info(namelink) &quot;&lt;a href=\&quot;/shared/community-member?user_id=[ns_urlencode $user_id]\&quot;&gt;$first_names $last_name&lt;/a&gt;&quot;
    set user_info(emaillink) &quot;&lt;a href=\&quot;mailto:$email\&quot;&gt;$email&lt;/a&gt;&quot;
    return [array get user_info]
}

</programlisting>


<para>Using Arrays and Pass-By-Reference</para>

<para>
Sometimes pass-by-value incurs too much overhead, and you&#39;d rather
pass-by-reference. Specifically, if you&#39;re writing a proc that uses
arrays internally to build up some value, there are many entries in the
array, and you&#39;re planning on iterating over the proc many times. In this
case, pass-by-value is expensive, and you&#39;d use pass-by-reference. 
</para>

<blockquote><para><emphasis>The transformation of the array into a list and back to an
array takes, in our test environment, approximately 10 microseconds per entry
of 100 character&#39;s length. Thus you can process about 100 entries per
millisecond. The time depends almost completely on the number of entries, and
almost not at all on the size of the entries.</emphasis></para></blockquote>

<para>
You implement pass-by-reference in Tcl by taking the name of an array
as an argument and <computeroutput>upvar</computeroutput> it. 
</para>
 

<programlisting>

ad_proc ad_get_user_info { 
    -array:required
    user_id 
} {
    upvar $array user_info
    db_1row user_info { select first_names, last_name, email from users where user_id = :user_id }
    set user_info(name) &quot;$first_names $last_name&quot;
    set user_info(email) $email
    set user_info(namelink) &quot;&lt;a href=\&quot;/shared/community-member?user_id=[ns_urlencode $user_id]\&quot;&gt;$first_names $last_name&lt;/a&gt;&quot;
    set user_info(emaillink) &quot;&lt;a href=\&quot;mailto:$email\&quot;&gt;$email&lt;/a&gt;&quot;
}

ad_get_user_info -array user_info $user_id

doc_body_append &quot;$user_info(namelink) ($user_info(emaillink))&quot;

</programlisting>


<para>We prefer pass-by-value over pass-by-reference. Pass-by-reference makes
the code harder to read and debug, because changing a value in one place has
side effects in other places. Especially if have a chain of
<computeroutput>upvar</computeroutput>s through several layers of the call stack, you&#39;ll have
a hard time debugging.</para>

<para>Multisets: Using <computeroutput>ns_set</computeroutput>s and Pass-By-Reference</para>

<para>
An array is a type of <emphasis>set</emphasis>, which means you can&#39;t have multiple
entries with the same key. Data structures that can have multiple entries for
the same key are known as a <emphasis>multiset</emphasis> or <emphasis>bag</emphasis>. 
</para>

<para>If your data can have multiple entries with the same key,
you should use the AOLserver built-in <ulink url="http://www.aolserver.com/docs/tcldev/tapi-120.htm#197598"><computeroutput>
ns_set</computeroutput></ulink>. You can also do a case-insensitive lookup on an
<computeroutput>ns_set</computeroutput>, something you can&#39;t easily do on an array. This is
especially useful for things like HTTP headers, which happen to have these
exact properties.</para>

<para>You always use pass-by-reference with <computeroutput>ns_set</computeroutput>s, since they
don&#39;t have any built-in way of generating and reconstructing themselves
from a string representation. Instead, you pass the handle to the set.</para>

 

<programlisting>

ad_proc ad_get_user_info {
    -set:required
    user_id
} {
    db_1row user_info { select first_names, last_name, email from users where user_id = :user_id }
    ns_set put $set name &quot;$first_names $last_name&quot;
    ns_set put $set email $email
    ns_set put $set namelink &quot;&lt;a href=\&quot;/shared/community-member?user_id=[ns_urlencode $user_id]\&quot;&gt;$first_names $last_name&lt;/a&gt;&quot;
    ns_set put $set emaillink &quot;&lt;a href=\&quot;mailto:$email\&quot;&gt;$email&lt;/a&gt;&quot;
}

set user_info [ns_set create]
ad_get_user_info -set $user_info $user_id

doc_body_append &quot;[ns_set get $user_info namelink] ([ns_set get $user_info emaillink])&quot;

</programlisting>

<para>
We don&#39;t recommend <computeroutput>ns_set</computeroutput> as a general mechanism for passing
sets (as opposed to multisets) of data. Not only do they inherently use
pass-by-reference, which we dis-like, they&#39;re also somewhat clumsy to
use, since Tcl doesn&#39;t have built-in syntactic support for them. 
</para>

<para>Consider for example a loop over the entries in a <computeroutput>ns_set</computeroutput> as
compared to an array:</para>

 

<programlisting>

# ns_set variant
set size [ns_set size $myset]
for { set i 0 } { $i &lt; $size } { incr i } {
    puts &quot;[ns_set key $myset $i] = [ns_set value $myset $i]&quot;
}

# array variant
foreach name [array names myarray] {
    puts &quot;$myarray($name) = $myarray($name)&quot;
}

</programlisting>

<para>
And this example of constructing a value: 
</para>
 

<programlisting>

# ns_set variant
set myset [ns_set create]
ns_set put $myset foo $foo
ns_set put $myset baz $baz
return $myset

# array variant
return [list
    foo $foo
    baz $baz
]

</programlisting>

<para>
<computeroutput>ns_set</computeroutput>s are designed to be lightweight, so memory consumption
should not be a problem. However, when using <computeroutput>ns_set get</computeroutput> to
perform lookup by name, they perform a linear lookup, whereas arrays use a
hash table, so <computeroutput>ns_set</computeroutput>s are slower than arrays when the number of
entries is large. 
</para>

<para><phrase role="cvstag">($Id: programming-with-aolserver.xml,v 1.10.2.1 2019/08/09 20:04:23 gustafn Exp $)</phrase></para>

</sect2>

</sect1>