- I OpenACS For Everyone
- I.1 High level information: What is OpenACS?
- I.1.1 Overview
- I.1.2 OpenACS Release Notes
- I.2 OpenACS: robust web development framework
- I.2.1 Introduction
- I.2.2 Basic infrastructure
- I.2.3 Advanced infrastructure
- I.2.4 Domain level tools
- I.1 High level information: What is OpenACS?
- II Administrator's Guide
- II.2 Installation Overview
- II.2.1 Basic Steps
- II.2.2 Prerequisite Software
- II.3 Complete Installation
- II.3.1 Install a Unix-like system and supporting software
- II.3.2 Install Oracle 10g XE on debian
- II.3.2.1 Install Oracle 8.1.7
- II.3.3 Install PostgreSQL
- II.3.4 Install AOLserver 4
- II.3.5 Quick Install of OpenACS
- II.3.5.1 Complex Install OpenACS 5.3
- II.3.6 OpenACS Installation Guide for Windows2000
- II.3.7 OpenACS Installation Guide for Mac OS X
- II.4 Configuring a new OpenACS Site
- II.4.1 Installing OpenACS packages
- II.4.2 Mounting OpenACS packages
- II.4.3 Configuring an OpenACS package
- II.4.4 Setting Permissions on an OpenACS package
- II.4.5 How Do I?
- II.4.6 Configure OpenACS look and feel with templates
- II.5 Upgrading
- II.5.1 Overview
- II.5.2 Upgrading 4.5 or higher to 4.6.3
- II.5.3 Upgrading OpenACS 4.6.3 to 5.0
- II.5.4 Upgrading an OpenACS 5.0.0 or greater installation
- II.5.5 Upgrading the OpenACS files
- II.5.6 Upgrading Platform components
- II.6 Production Environments
- II.6.1 Starting and Stopping an OpenACS instance.
- II.6.2 AOLserver keepalive with inittab
- II.6.3 Running multiple services on one machine
- II.6.4 High Availability/High Performance Configurations
- II.6.5 Staged Deployment for Production Networks
- II.6.6 Installing SSL Support for an OpenACS service
- II.6.7 Set up Log Analysis Reports
- II.6.8 External uptime validation
- II.6.9 Diagnosing Performance Problems
- II.7 Database Management
- II.7.1 Running a PostgreSQL database on another server
- II.7.2 Deleting a tablespace
- II.7.3 Vacuum Postgres nightly
- II.8 Backup and Recovery
- II.8.1 Backup Strategy
- II.8.2 Manual backup and recovery
- II.8.3 Automated Backup
- II.8.4 Using CVS for backup-recovery
- II.A Install Red Hat 8/9
- II.B Install additional supporting software
- II.B.1 Unpack the OpenACS tarball
- II.B.2 Initialize CVS (OPTIONAL)
- II.B.3 Add PSGML commands to emacs init file (OPTIONAL)
- II.B.4 Install Daemontools (OPTIONAL)
- II.B.5 Install qmail (OPTIONAL)
- II.B.6 Install Analog web file analyzer
- II.B.7 Install nspam
- II.B.8 Install Full Text Search
- II.B.9 Install Full Text Search using Tsearch2
- II.B.10 Install Full Text Search using OpenFTS (deprecated see tsearch2)
- II.B.11 Install nsopenssl
- II.B.12 Install tclwebtest.
- II.B.13 Install PHP for use in AOLserver
- II.B.14 Install Squirrelmail for use as a webmail system for OpenACS
- II.B.15 Install PAM Radius for use as external authentication
- II.B.16 Install LDAP for use as external authentication
- II.B.17 Install AOLserver 3.3oacs1
- II.C Credits
- II.C.1 Where did this document come from?
- II.C.2 Linux Install Guides
- II.C.3 Security Information
- II.C.4 Resources
- II.2 Installation Overview
- III For OpenACS Package Developers
- III.9 Development Tutorial
- III.9.1 Creating an Application Package
- III.9.2 Setting Up Database Objects
- III.9.3 Creating Web Pages
- III.9.4 Debugging and Automated Testing
- III.10 Advanced Topics
- III.10.1 Write the Requirements and Design Specs
- III.10.2 Add the new package to CVS
- III.10.3 OpenACS Edit This Page Templates
- III.10.4 Adding Comments
- III.10.5 Admin Pages
- III.10.6 Categories
- III.10.7 Profile your code
- III.10.8 Prepare the package for distribution.
- III.10.9 Distributing upgrades of your package
- III.10.10 Notifications
- III.10.11 Hierarchical data
- III.10.12 Using .vuh files for pretty urls
- III.10.13 Laying out a page with CSS instead of tables
- III.10.14 Sending HTML email from your application
- III.10.15 Basic Caching
- III.10.16 Scheduled Procedures
- III.10.17 Enabling WYSIWYG
- III.10.18 Adding in parameters for your package
- III.10.19 Writing upgrade scripts
- III.10.20 Connect to a second database
- III.10.21 Future Topics
- III.11 Development Reference
- III.11.1 OpenACS Packages
- III.11.2 OpenACS Data Models and the Object System
- III.11.3 The Request Processor
- III.11.4 The OpenACS Database Access API
- III.11.5 Using Templates in OpenACS
- III.11.6 Groups, Context, Permissions
- III.11.7 Writing OpenACS Application Pages
- III.11.8 Parties in OpenACS
- III.11.9 OpenACS Permissions Tediously Explained
- III.11.10 Object Identity
- III.11.11 Programming with AOLserver
- III.11.12 Using Form Builder: building html forms dynamically
- III.12 Engineering Standards
- III.12.1 OpenACS Style Guide
- III.12.2 Release Version Numbering
- III.12.3 Constraint naming standard
- III.12.4 ACS File Naming and Formatting Standards
- III.12.5 PL/SQL Standards
- III.12.6 Variables
- III.12.7 Automated Testing
- III.13 CVS Guidelines
- III.13.1 Using CVS with OpenACS
- III.13.2 OpenACS CVS Concepts
- III.13.3 Contributing code back to OpenACS
- III.13.4 Additional Resources for CVS
- III.14 Documentation Standards
- III.14.1 OpenACS Documentation Guide
- III.14.2 Using PSGML mode in Emacs
- III.14.3 Using nXML mode in Emacs
- III.14.4 Detailed Design Documentation Template
- III.14.5 System/Application Requirements Template
- III.15 TCLWebtest
- III.16 Internationalization
- III.16.1 Internationalization and Localization Overview
- III.16.2 How Internationalization/Localization works in OpenACS
- III.16.4 Design Notes
- III.16.5 Translator's Guide
- III.D Using CVS with an OpenACS Site
- III.9 Development Tutorial
- IV For OpenACS Platform Developers
- IV.17 Kernel Documentation
- IV.17.1 Overview
- IV.17.2 Object Model Requirements
- IV.17.3 Object Model Design
- IV.17.4 Permissions Requirements
- IV.17.5 Permissions Design
- IV.17.6 Groups Requirements
- IV.17.7 Groups Design
- IV.17.8 Subsites Requirements
- IV.17.9 Subsites Design Document
- IV.17.10 Package Manager Requirements
- IV.17.11 Package Manager Design
- IV.17.12 Database Access API
- IV.17.13 OpenACS Internationalization Requirements
- IV.17.14 Security Requirements
- IV.17.15 Security Design
- IV.17.16 Security Notes
- IV.17.17 Request Processor Requirements
- IV.17.18 Request Processor Design
- IV.17.19 Documenting Tcl Files: Page Contracts and Libraries
- IV.17.20 Bootstrapping OpenACS
- IV.17.21 External Authentication Requirements
- IV.18 Releasing OpenACS
- IV.18.1 OpenACS Core and .LRN
- IV.18.2 How to Update the OpenACS.org repository
- IV.18.3 How to package and release an OpenACS Package
- IV.18.4 How to Update the translations
- IV.17 Kernel Documentation
- V Tcl for Web Nerds
- V.1 Tcl for Web Nerds Introduction
- V.2 Basic String Operations
- V.3 List Operations
- V.4 Pattern matching
- V.5 Array Operations
- V.6 Numbers
- V.7 Control Structure
- V.8 Scope, Upvar and Uplevel
- V.9 File Operations
- V.10 Eval
- V.11 Exec
- V.12 Tcl for Web Use
- V.13 OpenACS conventions for TCL
- V.14 Solutions
- VI SQL for Web Nerds
- VI.1 SQL Tutorial
- VI.1.1 SQL Tutorial
- VI.1.2 Answers
- VI.2 SQL for Web Nerds Introduction
- VI.3 Data modeling
- VI.3.1 The Discussion Forum -- philg's personal odyssey
- VI.3.2 Data Types (Oracle)
- VI.3.4 Tables
- VI.3.5 Constraints
- VI.4 Simple queries
- VI.5 More complex queries
- VI.6 Transactions
- VI.7 Triggers
- VI.8 Views
- VI.9 Style
- VI.10 Escaping to the procedural world
- VI.11 Trees
- VI.1 SQL Tutorial
V.4 Pattern matching
Pattern matching is important across a wide variety of Web programming tasks but most notably when looking for exceptions in user-entered data and when trying to parse information out of non-cooperating Web sites.
Tcl's pattern matching facilities test whether a given string matches a specified pattern. Patterns are described using a syntax known as regular expressions. For example, the pattern expression consisting of a single period matches any character. The pattern a..a
matches any four-character string whose first and last characters are both a
.
The regexp
command takes a pattern, a string, and an optional match variable. It tests whether the string matches the pattern, returns 1 if there is a match and zero otherwise, and sets the match variable to the part of the string that matched the pattern:
Patterns can also contain subpatterns (delimited by parentheses) and denote repetition. A star denotes zero or more occurrences of a pattern, so
% set something candelabra
candelabra
% regexp a..a $something match
1
% set match
abra
a(.*)a
matches any string of at least two characters that begins and ends with the character a
. Whatever has matched the subpattern between the a's will get put into the first subvariable: Note that Tcl regexp by default behaves in a greedy fashion. There are three alternative substrings of "candelabra" that match the regexp
% set something candelabra
candelabra
% regexp a(.*)a $something match
1
% set match
andelabra
a(.*)a
: "andelabra", "andela", and "abra". Tcl chose the longest substring. This is very painful when trying to pull HTML pages apart: What you want is a non-greedy regexp, a standard feature of Perl and an option in Tcl 8.1 and later versions (see http://www.scriptics.com/services/support/howto/regexp81.html).
% set simple_case "Normal folks might say <i>et cetera</i>"
Normal folks might say <i>et cetera</i>
% regexp {<i>(.+)</i>} $simple_case match italicized_phrase
1
% set italicized_phrase
et cetera
% set some_html "Pedants say <i>sui generis</i> and <i>ipso facto</i>"
Pedants say <i>sui generis</i> and <i>ipso facto</i>
% regexp {<i>(.+)</i>} $some_html match italicized_phrase
1
% set italicized_phrase
sui generis</i> and <i>ipso facto
Lisp systems in the 1970s included elegant ways of returning all possibilities when there were multiple matches for an expression. Java libraries, Perl, and Tcl demonstrate the progress of the field of computer science by ignoring these superior systems of decades past.
Matching Cookies From the Browser
A common problem in Web development is pulling information out of cookies that come from the client. The cookie spec at http://home.netscape.com/newsref/std/cookie_spec.html mandates that multiple cookies be separated by semicolons. So you look for "the cookie name that you've been using" followed by an equals sign and them slurp up anything that follows that isn't a semicolon. Here is how the ArsDigita Community System looks for the value of the last_visit cookie:
Note the square brackets inside the regexp. The Tcl interpreter isn't trying to call a procedure because the entire regexp has been grouped with braces rather than double quotes. Square brackets denote a range of acceptable characters:
regexp {last_visit=([^;]+)} $cookie match last_visit
[A-Z]
would match any uppercase character[ABC]
would match any of first three characters in the alphabet (uppercase only)[^ABC]
would match any character other than the first three uppercase characters in the alphabet, i.e., the^
reverses the sense of the brackets
[^;]
says "one or more characters that meets the preceding spec", i.e., "one or more characters that isn't a semicolon". It is distinguished from *
in that there must be at least one character for a match.
If successful, the regexp
command above will set the match
variable with the complete matching string, starting from "last_visit=". Our code doesn't make any use of this variable but only looks at the subvar last_visit
that would also have been set.
Pages that use this cookie expect an integer and this code failed in one case where a user edited his cookies file and corrupted it so that his browser was sending several thousands bytes of garbage after the "last_visit=". A better approach might have been to limit the match to digits:
regexp {last_visit=([0-9]+)} $cookie match last_visit
Matching Into Multiple Variables
More generally regexp
allows multiple pattern variables. The pattern variables after the first are set to the substrings that matched the subpatterns. Here is an example of matching a credit card expiration date entered by a user:
Each pair of parentheses corresponds to a subpattern variable.
% set date_typed_by_user "06/02"
06/02
% regexp {([0-9][0-9])/([0-9][0-9])} $date_typed_by_user match month year
1
% set month
06
% set year
02
%
Full Syntax
The most general form of regexp
includes optional flags as well as multiple match variables:
The various flags are
regexp [flags] pattern data matched_result var1 var2 ...
-nocase
uppercase characters in the data are bashed down to lower for case-insensitive matching (make sure that your pattern is all lowercase!)-indices
the returned values of the regexp contain the indices delimiting the matched substring, rather than the strings themselves.- If your pattern begins with a
-
, put a--
flag at the end of your flags
.
matches any character.*
matches zero or more instances of the previous pattern item.+
matches one or more instances of the previous pattern item.?
matches zero or one instances of the previous pattern item.|
disjunction, e.g.,(a|b)
matches ana
or ab
( )
groups a sub-pattern.[ ]
delimits a set of characters. ASCII Ranges are specified using hyphens, e.g.,[A-z]
matches any character from uppercaseA
through lowercasez
(i.e., any alphabetic character). If the first character in the set is^
, this complements the set, e.g.,[^A-z]
matches any non-alphabetic character.^
Matches only when the pattern appears at the beginning of the string. The^
must appear at the beginning of the pattern expression.$
Matches only when the pattern appears at the end of the string. The$
must appear last in the pattern expression.
More: http://www.tcl.tk/man/tcl8.4/TclCmd/regexp.htm
Matching with substitution
It's common in Web programming to create strings by substitution. Tcl's regsub
command performs substitution based on a pattern:
matches the pattern against the data. If the match succeeds, the variable named
regsub [flags] pattern data replacements var
var
is set to data
, with various parts modified, as specified by replacements
. If the match fails, var
is simply set to data
. The value returned by regsub
is the number of replacements performed.
The flag -all
specifies that every occurrence of the pattern should be replaced. Otherwise only the first occurrence is replaced. Other flags include -nocase
and --
as with regexp
Here's an example from the banner ideas module of the ArsDigita Community System (see http://photo.net/doc/bannerideas.html). The goal is that each banner idea contain a linked thumbnail image. To facilitate cutting and pasting of the image html, we don't require that the publisher include uniform subtags within the IMG. However, we use regexp
to clean up:
# turn "<img align=right hspace=5" into "<img align=left border=0 hspace=8"
regsub -nocase {align=[^ ]+} $picture_html "" without_align
regsub -nocase {hspace=[^ ]+} $without_align "" without_hspace
regsub -nocase {<img} $without_hspace {<img align=left border=0 hspace=8} final_photo_html
In the example above, <replacements> specified the literal characters ''
. Other replacement directives include:
&
inserts the string that matched the pattern- The backslashed numbers
\1
through\9
inserts the strings that matched the corresponding sub-patterns in the pattern.
<!--
and -->
) by the comment text, enclosed in parentheses.
% proc extract_comment_text {html} {
regsub -all {<!--([^-]*)-->} $html {(\1)} with_exposed_comments
return $with_exposed_comments
}
% extract_comment_text {<!--insert the price below-->
We give the same low price to everyone: $219.99
<!--make sure to query out discount if this is one of our big customers-->}
(insert the price below)
We give the same low price to everyone: $219.99
(make sure to query out discount if this is one of our big customers)
More: http://www.tcl.tk/man/tcl8.4/TclCmd/regsub.htm
String match
Tcl provides an alternative matching mechanism that is simpler for users to understand than regular expressions. The Tcl command string match
uses "GLOB-style" matching. Here is the syntax:
It returns 1 if there is a match and 0 otherwise. The only pattern elements permitted here are
string match pattern data
?
, which matches any single character; *
, which matches any sequence; and []
, which delimits a set of characters or a range. This differs from regexp
in that the pattern must match the entire string supplied: Here's an example of the character range system in use:
% regexp "foo" "foobar"
1
% string match "foo" "foobar"
0
% # here's what we need to do to make the string match
% # work like the regexp
% string match "*foo*" foobar
1
string match {*[0-9]*} $text
returns 1 if text
contains at least one digit and 0 otherwise.
More: http://www.tcl.tk/man/tcl8.4/TclCmd/string.htm
Exercises
1.
- Write a procedure which takes a string and makes sure that the result contains an "@" sign
- Extend the procedure to make sure that only letters, numbers are allowed before the "@" sign
- Extend the procedure to check that after the @ sign comes a valid domain (hint, look at 2.) A valid domain contains of at least one "." and only letters after the last ".". so malte.cognovis.de is a valid domain, cognovis.d1e is not.
- Extend the procedure to return "Welcome foo, member of bar.com" if the string is "foo@bar.com"
- Extend the procedure to return "Welcome OpenACS member foo" if the string is like "foo@openacs.org" meaning, the e-mail ends with openacs.org
- Check against the valid domain again. This time make use of the ad_locales table installed in your local copy of OpenACS. To make this work you will have to use the OpenACS Shell.
- Get a list of all countries from the table ad_locales. Choose the language column for this. The command to extract this is "db_list".
- If your list contains the language "ca" more than once, make sure to limit it to one "ca" only. Make sure this works for others as well.
- As ".com" ".org" and ".net" are also valid domain ending append them to the list.
- Make sure that the domain ends on any language defined in your list you created. So automotive.ca works but automotive.eu does not (and yes, I know that .eu is now a valid domain :-)).
Answer
2.
- Search at amazon.com for your favorite book. Copy the URL until you see the "/ref..." part, e.g. http://www.amazon.com/4-Hour-Workweek-Escape-Live-Anywhere/dp/0307353133
- In the OpenACS shell use "ad_httpget" to retrieve the URL you copied. Look at the api doc for the syntax.
- Use regexp to find the price of the book in the html source returned to you by ad_httpget
- Return the price of the book.
Answer
---
based on Tcl for Web Nerds