rp-design.xml

Delivered as text/xml

[ hide source ] | [ make this the default ]

File Contents

<?xml version='1.0' ?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
               "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
<!ENTITY % myvars SYSTEM "../variables.ent">
%myvars;
]>
<sect1 id="rp-design" xreflabel="OpenACS 4 Request Processor Design">
<title>Request Processor Design</title>


<authorblurb>
<para>By <ulink url="http://planitia.org">Rafael H. Schloming</ulink> </para>
</authorblurb>


<sect2 id="rp-design-essentials">
<title>Essentials</title>


<itemizedlist>
<listitem>
<para><xref linkend="rp-requirements"/></para>
</listitem>

<listitem>
<para><ulink url="request-processor">Request Processor Stages and API</ulink></para>
</listitem>

<listitem><para><ulink url="/api-doc/procs-file-view?path=packages/acs-tcl/tcl/request-processor-procs.tcl">
/packages/acs-tcl/tcl/request-processor-procs.tcl</ulink></para></listitem>

<listitem><para><ulink url="/api-doc/procs-file-view?path=packages/acs-tcl/tcl/request-processor-init.tcl">
/packages/acs-tcl/tcl/request-processor-init.tcl</ulink></para></listitem>

<listitem><para><ulink url="/api-doc/procs-file-view?path=packages/acs-tcl/tcl/site-nodes-procs.tcl">
/packages/acs-tcl/tcl/site-nodes-procs.tcl</ulink></para></listitem>

<listitem><para><ulink url="/api-doc/procs-file-view?path=packages/acs-tcl/tcl/site-nodes-init.tcl">
/packages/acs-tcl/tcl/site-nodes-init.tcl</ulink></para></listitem>

<listitem><para><ulink url="/doc/sql/display-sql?package_key=acs-kernel&amp;url=site-nodes-create.sql">
/packages/acs-kernel/sql/site-nodes-create.sql</ulink></para></listitem>
</itemizedlist>

</sect2>

<sect2 id="rp-design-intro">
<title>Introduction</title>

<para>The request processor is the set of procs that responds to every HTTP
request made to the OpenACS. The request processor must authenticate the
connecting user, and make sure that he is authorized to perform the given
request. If these steps succeed, then the request processor must locate the
file that is associated with the specified URL, and serve the content it
provides to the browser.</para>


</sect2>

<sect2 id="rp-design-related-systems">
<title>Related Systems</title>


<itemizedlist>
<listitem><para><xref linkend="apm-design"/></para></listitem>
</itemizedlist>

</sect2>

<sect2 id="rp-design-terminology">
<title>Terminology</title>


<itemizedlist>
<listitem><para>
<emphasis role="strong">pageroot</emphasis> -- Any directory that contains scripts and/or
static files intended to be served in response to HTTP requests. A typical
OpenACS installation is required to serve files from multiple pageroots.</para>
</listitem>

<listitem>
<para><emphasis role="strong">global pageroot</emphasis>
(<emphasis role="strong">/var/lib/aolserver/<emphasis>servicename</emphasis>/www</emphasis>) -- Files appearing under
this pageroot will be served directly off the base url
http://www.<emphasis>servicename</emphasis>.com/</para>
</listitem>

<listitem>
<para><emphasis role="strong">package root</emphasis>
(<emphasis role="strong">/var/lib/aolserver/<emphasis>servicename</emphasis>/packages</emphasis>) -- Each subdirectory of
the package root is a package. A typical OpenACS installation will have several
packages.</para>
</listitem>

<listitem>
<para><emphasis role="strong">package pageroot</emphasis>
(<emphasis role="strong">/var/lib/aolserver/<emphasis>servicename</emphasis>/packages/<emphasis>package_key</emphasis>/www</emphasis>)
-- This is the pageroot for the <emphasis>package_key</emphasis> package.</para>
</listitem>

<listitem>
<para><emphasis role="strong">request environment</emphasis> (<emphasis role="strong">ad_conn</emphasis>) -- This is
a global namespace containing variables associated with the current
request.</para>
</listitem>

<listitem>
<para><emphasis role="strong">abstract URL</emphasis> -- A URL with no extension that doesn&#39;t
directly correspond to a file in the filesystem.</para>
</listitem>

<listitem>
<para><emphasis role="strong">abstract file</emphasis> or <emphasis role="strong">abstract path</emphasis> -- A URL
that has been translated into a filesystem path (probably by prepending the
appropriate pageroot), but still doesn&#39;t have any extension and so does
not directly correspond to a file in the filesystem.</para>
</listitem>

<listitem>
<para><emphasis role="strong">concrete file</emphasis> or <emphasis role="strong">concrete path</emphasis> -- A file
or path that actually references something in the filesystem.</para>
</listitem>
</itemizedlist>

</sect2>

<sect2 id="rp-design-system-overview">
<title>System Overview</title>


<para><emphasis role="strong">Package Lookup</emphasis></para>

<para>One of the first things the request processor must do is to determine
which package instance a given request references, and based on this
information, which pageroot to use when searching for a file to serve. During
this process the request processor divides the URL into two pieces. The first
portion identifies the package instance. The rest identifies the path into
the package pageroot. For example if the news package is mounted on
/offices/boston/announcements/, then a request for
/offices/boston/announcements/index would be split into the
<emphasis role="strong">package_url</emphasis> (/offices/boston/announcements/), and the
abstract (no extension info) file path (index). The request processor must be
able to figure out which <emphasis role="strong">package_id</emphasis> is associated with a
given package_url, and package mountings must be persistent across server
restarts and users must be able to manipulate the mountings on a live site,
therefore, this mapping is stored in the database.</para>

<para><emphasis role="strong">Authentication and Authorization</emphasis></para>

<para>Once the request processor has located both the package_id and concrete
file associated with the request, authentication is performed by the <ulink url="security-design.html">session</ulink> security system. After authentication has
been performed the user is authorized to have read access for the given
package by the <xref linkend="permissions-design"/>.
If authorization succeeds then the request is served, otherwise it is
aborted.</para>

<para><emphasis role="strong">Concrete File Search</emphasis></para>

<para>To actually serve a file, the request processor generates an ordered list
of abstract paths and searches each path for a concrete file. The first path
searched is composed of the package pageroot with the extra portion of the
URL appended. The second abstract path consists of the global pageroot with
the full URL appended. This means that if an instance of the news package is
mounted on /offices/boston/announcements/, then any requests that are not
matched by something in the news package pageroot could be matched by
something under the global pageroot in the /offices/boston/announcements/
directory. Files take precedence over directory listings, so an index file in
the global pageroot will be served instead of a directory listing in the
package pageroot, even though the global pageroot is searched later. If a
file is found at any of the searched locations then it is served.</para>

<para><emphasis role="strong">Virtual URL Handlers</emphasis></para>

<para>If no file is found during the concrete file search, then the request
processor searches the filesystem for a <emphasis role="strong">virtual url handler</emphasis>
(<emphasis role="strong">.vuh</emphasis>) file. This file contains normal Tcl code, and is in
fact handled by the same extension handling procedure that handles .tcl
files. The only way this file is treated differently is in how the request
processor searches for it. When a lookup fails, the request processor
generates each valid prefix of all the abstract paths considered in the
concrete file search, and searches these prefixes in order from most specific
to least specific for a matching .vuh file. If a file is found then the
ad_conn variable <emphasis role="strong">path_info</emphasis> is set to the portion of the url
<emphasis>not</emphasis> matched by the .vuh script, and the script is sourced. This
facility is intended to replace the concept of registered procs, since no
special distinction is required between sitewide procs and package specific
procs when using this facility. It is also much less prone to overlap and
confusion than the use of registered procs, especially in an environment with
many packages installed.</para>

</sect2>

<sect2 id="rp-design-site-nodes">
<title>Site Nodes</title>

<para>The request processor manages the mappings from URL patterns to package
instances with the site_nodes data model. Every row in the site_nodes table
represents a fully qualified URL. A package can be mounted on any node in
this data model. When the request processor performs a URL lookup, it
determines which node matches the longest possible prefix of the request URI.
In order to make this lookup operation as fast as possible, the rows in the
site_nodes table are pulled out of the database at server startup, and stored
in memory.</para>

<para>The memory structure used to store the site_nodes mapping is a hash table
that maps from the fully qualified URL of the node, to the package_id and
package_key of the package instance mounted on the node. A lookup is
performed by starting with the full request URI and successively stripping
off the rightmost path components until a match is reached. This way the time
required to lookup a URL is proportional to the length of the URL, not to the
number of entries in the mapping.</para>

</sect2>

<sect2 id="rp-design-req-env">
<title>Request Environment</title>


<para>The request environment is managed by the procedure
<emphasis role="strong">ad_conn</emphasis>. Variables can be set and retrieved through use of
the ad_conn procedure. The following variables are available for public use.
If the ad_conn procedure doesn&#39;t recognize a variable being passed to it
for a lookup, it tries to get a value using ns_conn. This guarantees that
ad_conn subsumes the functionality of ns_conn.</para>

 
<informaltable frame="none">
<tgroup cols="2">
<colspec colname="c1"/>
<colspec colname="c2"/>
<spanspec spanname="hspan" namest="c1" nameend="c2"/>
<tbody>

<row>
<entry spanname="hspan"><emphasis role="strong">Request processor</emphasis></entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn urlv]</computeroutput> </entry>
<entry valign="top">A list containing each element of the URL</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn url]</computeroutput> </entry>
<entry valign="top">The URL associated with the request.</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn query]</computeroutput> </entry>
<entry valign="top">The portion of the URL from the ? on (i.e. GET
              variables) associated with the request.</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn file]</computeroutput> </entry>
<entry valign="top">The filepath including filename of the file being served</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn request]</computeroutput> </entry>
<entry valign="top">The number of requests since the server was last started</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn start_clicks]</computeroutput> </entry>
<entry valign="top">The system time when the RP starts handling the request</entry>
</row>

<row>
<entry spanname="hspan"> </entry>
</row>

<row>
<entry spanname="hspan"><emphasis role="strong">Session System Variables</emphasis>: set in
sec_handler, check security with ad_validate_security_info</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn session_id]</computeroutput> </entry>
<entry valign="top">The unique session_id coming from the sequence
<computeroutput>sec_id_seq</computeroutput></entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn user_id]</computeroutput> </entry>
<entry valign="top">User_id of a person if the person is logged in. Otherwise, it is
blank</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn sec_validated]</computeroutput> </entry>
<entry valign="top">This becomes &quot;secure&quot; when the connection uses SSL</entry>
</row>

<row>
<entry spanname="hspan"> </entry>
</row>

<row>
<entry spanname="hspan"><emphasis role="strong">Database API</emphasis></entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn db,handles]</computeroutput> </entry>
<entry valign="top">What are the list of handles available to AOL?</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn db,n_handles_used]</computeroutput> </entry>
<entry valign="top">How many database handles are currently used?</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn db,last_used]</computeroutput> </entry>
<entry valign="top">Which database handle did we use last?</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn db,transaction_level,$db]</computeroutput> </entry>
<entry valign="top">Specifies what transaction level we are in</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn db,db_abort_p,$dbh]</computeroutput> </entry>
<entry valign="top">Whether the transaction is aborted</entry>
</row>

<row>
<entry spanname="hspan"> </entry>
</row>

<row>
<entry spanname="hspan"><emphasis role="strong">APM</emphasis></entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn xml_loaded_p]</computeroutput> </entry>
<entry valign="top">Checks whether the XML parser is loaded so that it only gets loaded once.
Set in apm_load_xml_packages</entry>
</row>

<row>
<entry spanname="hspan"> </entry>
</row>

<row>
<entry spanname="hspan"><emphasis role="strong">Packages</emphasis></entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn package_id]</computeroutput> </entry>
<entry valign="top">The package_id of the package associated with the URL.</entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn package_url]</computeroutput> </entry>
<entry valign="top">The URL on which the package is mounted.</entry>
</row>

<row>
<entry spanname="hspan"> </entry>
</row>

<row>
<entry spanname="hspan"><emphasis role="strong">Miscellaneous</emphasis></entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn system_p]</computeroutput> </entry>
<entry valign="top">If true then the request has been made to one of the special directories
specified in the config file (somewhere), and no authentication or
authorization has been performed.</entry>
</row>

<row>
<entry spanname="hspan"> </entry>
</row>

<row>
<entry spanname="hspan"><emphasis role="strong">Documentation</emphasis></entry>
</row>

<row>
<entry valign="top"><computeroutput>[ad_conn api_page_documentation_mode_p]</computeroutput> </entry>
<entry valign="top"></entry>
</row>
</tbody></tgroup></informaltable>


</sect2>

</sect1>