- I OpenACS For Everyone
- I.1 High level information: What is OpenACS?
- I.1.1 Overview
- I.1.2 OpenACS Release Notes
- I.2 OpenACS: robust web development framework
- I.2.1 Introduction
- I.2.2 Basic infrastructure
- I.2.3 Advanced infrastructure
- I.2.4 Domain level tools
- I.1 High level information: What is OpenACS?
- II Administrator's Guide
- II.2 Installation Overview
- II.2.1 Basic Steps
- II.2.2 Prerequisite Software
- II.3 Complete Installation
- II.3.1 Install a Unix-like system and supporting software
- II.3.2 Install Oracle 10g XE on debian
- II.3.2.1 Install Oracle 8.1.7
- II.3.3 Install PostgreSQL
- II.3.4 Install AOLserver 4
- II.3.5 Quick Install of OpenACS
- II.3.5.1 Complex Install OpenACS 5.3
- II.3.6 OpenACS Installation Guide for Windows2000
- II.3.7 OpenACS Installation Guide for Mac OS X
- II.4 Configuring a new OpenACS Site
- II.4.1 Installing OpenACS packages
- II.4.2 Mounting OpenACS packages
- II.4.3 Configuring an OpenACS package
- II.4.4 Setting Permissions on an OpenACS package
- II.4.5 How Do I?
- II.4.6 Configure OpenACS look and feel with templates
- II.5 Upgrading
- II.5.1 Overview
- II.5.2 Upgrading 4.5 or higher to 4.6.3
- II.5.3 Upgrading OpenACS 4.6.3 to 5.0
- II.5.4 Upgrading an OpenACS 5.0.0 or greater installation
- II.5.5 Upgrading the OpenACS files
- II.5.6 Upgrading Platform components
- II.6 Production Environments
- II.6.1 Starting and Stopping an OpenACS instance.
- II.6.2 AOLserver keepalive with inittab
- II.6.3 Running multiple services on one machine
- II.6.4 High Availability/High Performance Configurations
- II.6.5 Staged Deployment for Production Networks
- II.6.6 Installing SSL Support for an OpenACS service
- II.6.7 Set up Log Analysis Reports
- II.6.8 External uptime validation
- II.6.9 Diagnosing Performance Problems
- II.7 Database Management
- II.7.1 Running a PostgreSQL database on another server
- II.7.2 Deleting a tablespace
- II.7.3 Vacuum Postgres nightly
- II.8 Backup and Recovery
- II.8.1 Backup Strategy
- II.8.2 Manual backup and recovery
- II.8.3 Automated Backup
- II.8.4 Using CVS for backup-recovery
- II.A Install Red Hat 8/9
- II.B Install additional supporting software
- II.B.1 Unpack the OpenACS tarball
- II.B.2 Initialize CVS (OPTIONAL)
- II.B.3 Add PSGML commands to emacs init file (OPTIONAL)
- II.B.4 Install Daemontools (OPTIONAL)
- II.B.5 Install qmail (OPTIONAL)
- II.B.6 Install Analog web file analyzer
- II.B.7 Install nspam
- II.B.8 Install Full Text Search
- II.B.9 Install Full Text Search using Tsearch2
- II.B.10 Install Full Text Search using OpenFTS (deprecated see tsearch2)
- II.B.11 Install nsopenssl
- II.B.12 Install tclwebtest.
- II.B.13 Install PHP for use in AOLserver
- II.B.14 Install Squirrelmail for use as a webmail system for OpenACS
- II.B.15 Install PAM Radius for use as external authentication
- II.B.16 Install LDAP for use as external authentication
- II.B.17 Install AOLserver 3.3oacs1
- II.C Credits
- II.C.1 Where did this document come from?
- II.C.2 Linux Install Guides
- II.C.3 Security Information
- II.C.4 Resources
- II.2 Installation Overview
- III For OpenACS Package Developers
- III.9 Development Tutorial
- III.9.1 Creating an Application Package
- III.9.2 Setting Up Database Objects
- III.9.3 Creating Web Pages
- III.9.4 Debugging and Automated Testing
- III.10 Advanced Topics
- III.10.1 Write the Requirements and Design Specs
- III.10.2 Add the new package to CVS
- III.10.3 OpenACS Edit This Page Templates
- III.10.4 Adding Comments
- III.10.5 Admin Pages
- III.10.6 Categories
- III.10.7 Profile your code
- III.10.8 Prepare the package for distribution.
- III.10.9 Distributing upgrades of your package
- III.10.10 Notifications
- III.10.11 Hierarchical data
- III.10.12 Using .vuh files for pretty urls
- III.10.13 Laying out a page with CSS instead of tables
- III.10.14 Sending HTML email from your application
- III.10.15 Basic Caching
- III.10.16 Scheduled Procedures
- III.10.17 Enabling WYSIWYG
- III.10.18 Adding in parameters for your package
- III.10.19 Writing upgrade scripts
- III.10.20 Connect to a second database
- III.10.21 Future Topics
- III.11 Development Reference
- III.11.1 OpenACS Packages
- III.11.2 OpenACS Data Models and the Object System
- III.11.3 The Request Processor
- III.11.4 The OpenACS Database Access API
- III.11.5 Using Templates in OpenACS
- III.11.6 Groups, Context, Permissions
- III.11.7 Writing OpenACS Application Pages
- III.11.8 Parties in OpenACS
- III.11.9 OpenACS Permissions Tediously Explained
- III.11.10 Object Identity
- III.11.11 Programming with AOLserver
- III.11.12 Using Form Builder: building html forms dynamically
- III.12 Engineering Standards
- III.12.1 OpenACS Style Guide
- III.12.2 Release Version Numbering
- III.12.3 Constraint naming standard
- III.12.4 ACS File Naming and Formatting Standards
- III.12.5 PL/SQL Standards
- III.12.6 Variables
- III.12.7 Automated Testing
- III.13 CVS Guidelines
- III.13.1 Using CVS with OpenACS
- III.13.2 OpenACS CVS Concepts
- III.13.3 Contributing code back to OpenACS
- III.13.4 Additional Resources for CVS
- III.14 Documentation Standards
- III.14.1 OpenACS Documentation Guide
- III.14.2 Using PSGML mode in Emacs
- III.14.3 Using nXML mode in Emacs
- III.14.4 Detailed Design Documentation Template
- III.14.5 System/Application Requirements Template
- III.15 TCLWebtest
- III.16 Internationalization
- III.16.1 Internationalization and Localization Overview
- III.16.2 How Internationalization/Localization works in OpenACS
- III.16.4 Design Notes
- III.16.5 Translator's Guide
- III.D Using CVS with an OpenACS Site
- III.9 Development Tutorial
- IV For OpenACS Platform Developers
- IV.17 Kernel Documentation
- IV.17.1 Overview
- IV.17.2 Object Model Requirements
- IV.17.3 Object Model Design
- IV.17.4 Permissions Requirements
- IV.17.5 Permissions Design
- IV.17.6 Groups Requirements
- IV.17.7 Groups Design
- IV.17.8 Subsites Requirements
- IV.17.9 Subsites Design Document
- IV.17.10 Package Manager Requirements
- IV.17.11 Package Manager Design
- IV.17.12 Database Access API
- IV.17.13 OpenACS Internationalization Requirements
- IV.17.14 Security Requirements
- IV.17.15 Security Design
- IV.17.16 Security Notes
- IV.17.17 Request Processor Requirements
- IV.17.18 Request Processor Design
- IV.17.19 Documenting Tcl Files: Page Contracts and Libraries
- IV.17.20 Bootstrapping OpenACS
- IV.17.21 External Authentication Requirements
- IV.18 Releasing OpenACS
- IV.18.1 OpenACS Core and .LRN
- IV.18.2 How to Update the OpenACS.org repository
- IV.18.3 How to package and release an OpenACS Package
- IV.18.4 How to Update the translations
- IV.17 Kernel Documentation
- V Tcl for Web Nerds
- V.1 Tcl for Web Nerds Introduction
- V.2 Basic String Operations
- V.3 List Operations
- V.4 Pattern matching
- V.5 Array Operations
- V.6 Numbers
- V.7 Control Structure
- V.8 Scope, Upvar and Uplevel
- V.9 File Operations
- V.10 Eval
- V.11 Exec
- V.12 Tcl for Web Use
- V.13 OpenACS conventions for TCL
- V.14 Solutions
- VI SQL for Web Nerds
- VI.1 SQL Tutorial
- VI.1.1 SQL Tutorial
- VI.1.2 Answers
- VI.2 SQL for Web Nerds Introduction
- VI.3 Data modeling
- VI.3.1 The Discussion Forum -- philg's personal odyssey
- VI.3.2 Data Types (Oracle)
- VI.3.4 Tables
- VI.3.5 Constraints
- VI.4 Simple queries
- VI.5 More complex queries
- VI.6 Transactions
- VI.7 Triggers
- VI.8 Views
- VI.9 Style
- VI.10 Escaping to the procedural world
- VI.11 Trees
- VI.1 SQL Tutorial
VI.2 SQL for Web Nerds Introduction
The architect of any new information system must decide how much responsibility for data management the new custom software should take and how much should be left to packaged software and the operating system. This chapter explains what kind of packaged data management software is available, covering files, flat file database management systems, the RDBMS, object-relational database management systems, and object databases. This chapter also introduces the SQL language.
What Do You Need for Transaction Processing?
Data processing folks like to talk about the "ACID test" when deciding whether or not a database management system is adequate for handling transactions. An adequate system has the following properties:
The results of a transaction are invisible to other transactions until the transaction is complete. For example, if you are running an accounting report at the same time that Joe is transferring money, the accounting report program will either see the balances before Joe transferred the money or after, but never the intermediate state where checking has been credited but savings not yet debited.
Once committed (completed), the results of a transaction are permanent and survive future system and media failures. If the airline reservation system computer gives you seat 22A and crashes a millisecond later, it won't have forgotten that you are sitting in 22A and also give it to someone else. Furthermore, if a programmer spills coffee into a disk drive, it will be possible to install a new disk and recover the transactions up to the coffee spill, showing that you had seat 22A.
That doesn't sound too tough to implement, does it? And, after all, one of the most refreshing things about the Web is how it encourages people without formal computer science backgrounds to program. So why not build your Internet bank on a transaction system implemented by an English major who has just discovered Perl?
Because you still need indexing.
Finding Your Data (and Fast)
One facet of a database management system is processing inserts, updates, and deletes. This all has to do with putting information into the database. Sometimes it is also nice, though, to be able to get data out. And with popular sites getting 100 hits per second, it pays to be conscious of speed.
Flat files work okay if they are very small. A Perl script can read the whole file into memory in a split second and then look through it to pull out the information requested. But suppose that your on-line bank grows to have 250,000 accounts. A user types his account number into a Web page and asks for his most recent deposits. You've got a chronological financial transactions file with 25 million entries. Crunch, crunch, crunch. Your server laboriously works through all 25 million to find the ones with an account number that matches the user's. While it is crunching, 25 other users come to the Web site and ask for the same information about their accounts.
You have two choices: (1) buy a 64-processor Sun E10000 server with 64 GB of RAM, or (2) build an index file. If you build an index file that maps account numbers to sequential transaction numbers, your server won't have to search all 25 million records anymore. However, you have to modify all of your programs that insert, update, or delete from the database to also keep the index current.
This works great until two years later when a brand new MBA arrives from Harvard. She asks your English major cum Perl hacker for "a report of all customers who have more than $5,000 in checking or live in Oklahoma and have withdrawn more than $100 from savings in the last 17 days." It turns out that you didn't anticipate this query so your indexing scheme doesn't speed things up. Your server has to grind through all the data over and over again.
Enter the Relational Database
You are building a cutting-edge Web service. You need the latest and greatest in computer technology. That's why you use, uh, Unix. Yeah. Anyway, even if your operating system was developed in 1969, you definitely can't live without the most modern database management system available. Maybe this guy E.F. Codd can help:
"Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). ... Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information."Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. In Section 1, inadequacies of these models are discussed. A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user's model."
Sounds pretty spiffy, doesn't it? Just like what you need. That's the abstract to "A Relational Model of Data for Large Shared Data Banks", a paper Codd wrote while working at IBM's San Jose research lab. It was published in the Communications of the ACM in June, 1970.
From an application programmer's point of view, the biggest innovation in the relational database is that one uses a declarative query language, SQL (an acronym for Structured Query Language and pronounced "ess-cue-el" or "sequel"). Most computer languages are procedural. The programmer tells the computer what to do, step by step, specifying a procedure. In SQL, the programmer says "I want data that meet the following criteria" and the RDBMS query planner figures out how to get it. There are two advantages to using a declarative language. The first is that the queries no longer depend on the data representation. The RDBMS is free to store data however it wants. The second is increased software reliability. It is much harder to have "a little bug" in an SQL query than in a procedural program. Generally it either describes the data that you want and works all the time or it completely fails in an obvious way.
Another benefit of declarative languages is that less sophisticated users are able to write useful programs. For example, many computing tasks that required professional programmers in the 1960s can be accomplished by non-technical people with spreadsheets. In a spreadsheet, you don't tell the computer how to work out the numbers or in what sequence. You just declare "This cell will be 1.5 times the value of that other cell over there."
How Does This RDBMS Thing Work?
This is all you need to know to be a Caveman Database Programmer: A relational database is a big spreadsheet that several people can update simultaneously.
Each table in the database is one
spreadsheet. You tell the RDBMS how many columns each row has. For
example, in our mailing list database, the table has two columns: name
and email
.
Each entry in the database consists of one row in this table. An RDBMS
is more restrictive than a spreadsheet in that all the data in one
column must be of the same type, e.g., integer, decimal, character
string, or date. Another difference between a spreadsheet and an RDBMS
is that the rows in an RDBMS are not ordered. You can have a column
named row_number
and ask the RDBMS to return the rows
ordered according to the data in this column, but the row numbering is
not implicit as it would be with Visicalc or its derivatives such as
Lotus 1-2-3 and Excel. If you do define a row_number
column or some other unique identifier for rows in a table, it becomes
possible for a row in another table to refer to that row by including
the value of the unique ID.
Here's what some SQL looks like for the mailing list application
create table mailing_list (
email varchar(100) not null primary key,
name varchar(100)
);
The table will be called mailing_list
and will have two columns, both variable length character strings. We've added a couple of integrity constraints on the email
column. The not null
will prevent any program from inserting a row where name
is specified but email
is not. After all, the whole point of the system is to send people
e-mail so there isn't much value in having a name with no e-mail
address. The primary key
tells the database that this
column's value can be used to uniquely identify a row. That means the
system will reject an attempt to insert a row with the same e-mail
address as an existing row. This sounds like a nice feature, but it can
have some unexpected performance implications. For example, every time
anyone tries to insert a row into this table, the RDBMS will have to
look at all the other rows in the table to make sure that there isn't
already one with the same e-mail address. For a really huge table, that
could take minutes, but if you had also asked the RDBMS to create an
index for mailing_list
on email
then the
check becomes almost instantaneous. However, the integrity constraint
still slows you down because every update to the mailing_list
table will also require an update to the index and therefore you'll be doing twice as many writes to the hard disk.
That is the joy and the agony of SQL. Inserting two innocuous looking words can cost you a factor of 1000 in performance. Then inserting a sentence (to create the index) can bring you back so that it is only a factor of two or three. (Note that many RDBMS implementations, including Oracle, automatically define an index on a column that is constrained to be unique.)
Anyway, now that we've executed the Data Definition Language "create table" statement, we can move on to Data Manipulation Language: an INSERT.
insert into mailing_list (name, email)
values ('Philip Greenspun','philg@mit.edu');
Note that we specify into which columns we are inserting. That way, if someone comes along later and does
alter table mailing_list add (phone_number varchar(20));
the Oracle syntax for adding a column), our INSERT will still work. Note also that the string quoting character in SQL is a single quote. If you want to have a single quote in the string, double the single quote in "O'Grady":
insert into mailing_list (name, email)
values ('Michael O''Grady','ogrady@fastbuck.com');
Having created a table and inserted some data, at last we are ready to experience the awesome power of the SQL SELECT. Want your data back?
SQL> select * from mailing_list;
EMAIL NAME PHONE_NUMBER
------------------------- ------------------------- ------------
philg@mit.edu Philip Greenspun
ogrady@fastbuck.com Michael O'Grady
2 rows selected.
Note that there are no values in the phone_number
column because we haven't set any. As soon as we do start to add phone
numbers, we realize that our data model was inadequate. This is the
Internet and Joe Typical User will have his pants hanging around his
knees under the weight of a cell phone, beeper, and other personal
communication accessories. One phone number column is clearly
inadequate and even work_phone
and home_phone
columns won't accommodate the wealth of information users might want to
give us. The clean database-y way to do this is to remove our phone_number
column from the mailing_list
table and define a helper table just for the phone numbers. Removing or
renaming a column is easy nowadays, but for the beauty of clean code we
drop the old table and create two new ones:
drop table mailing_list;
create table mailing_list (
email varchar(100) not null primary key,
name varchar(100)
);
create table phone_numbers (
email varchar(100) not null references mailing_list(email),
number_type varchar(15) check (number_type in ('work','home','cell','beeper')),
phone_number varchar(20) not null
);
Note that in this table the email column is not
a primary key. That's because we want to allow multiple rows with the
same e-mail address. If you are hanging around with a database nerd
friend, you can say that there is a relationship between the rows in the phone_numbers
table and the mailing_list
table. In fact, you can say that it is a many-to-one relation because many rows in the phone_numbers
table may correspond to only one row in the mailing_list
table.
Another item worth noting about our two-table data model is that we do not store the user's name in the phone_numbers
table. That would be redundant with the mailing_list
table and potentially self-redundant as well, if, for example,
"robert.loser@fastbuck.com" says he is "Robert Loser" when he types in
his work phone and then "Rob Loser" when he puts in his beeper number,
and "Bob Lsr" when he puts in his cell phone number while typing on his
laptop's cramped keyboard.
Anyway, enough database nerdism. Let's populate the phone_numbers
table:
Ooops! When we dropped the
SQL> insert into phone_numbers values ('ogrady@fastbuck.com','work','(800) 555-1212');
ORA-02291: integrity constraint (SCOTT.SYS_C001080) violated - parent key not found
mailing_list
table, we lost all the rows. The phone_numbers
table has a referential integrity constraint ("references
mailing_list") to make sure that we don't record e-mail addresses for
people whose names we don't know. We have to first insert the two users
into mailing_list
: Note that the last four INSERTs use an evil SQL shortcut and don't specify the columns into which we are inserting data. The system defaults to using all the columns in the order that they were defined. Except for prototyping and playing around, we don't recommend ever using this shortcut.
insert into mailing_list (name, email)
values ('Philip Greenspun','philg@mit.edu');
insert into mailing_list (name, email)
values ('Michael O''Grady','ogrady@fastbuck.com');
insert into phone_numbers values ('ogrady@fastbuck.com','work','(800) 555-1212');
insert into phone_numbers values ('ogrady@fastbuck.com','home','(617) 495-6000');
insert into phone_numbers values ('philg@mit.edu','work','(617) 253-8574');
insert into phone_numbers values ('ogrady@fastbuck.com','beper','(617) 222-3456');
The first three INSERTs work fine, but what about the last one, where Mr. O'Grady misspelled "beeper"?
We asked Oracle at table definition time toORA-02290: check constraint (SCOTT.SYS_C001079) violated
check (number_type in ('work','home','cell','beeper'))
and it did. The database cannot be left in an inconsistent state.
Let's say we want all of our data out. Email, full name, phone numbers. The most obvious query to try is a join.
Yow! What happened? There are only two rows in the
SQL> select * from mailing_list, phone_numbers;
EMAIL NAME EMAIL TYPE NUMBER
---------------- ---------------- ---------------- ------ --------------
philg@mit.edu Philip Greenspun ogrady@fastbuck. work (800) 555-1212
ogrady@fastbuck. Michael O'Grady ogrady@fastbuck. work (800) 555-1212
philg@mit.edu Philip Greenspun ogrady@fastbuck. home (617) 495-6000
ogrady@fastbuck. Michael O'Grady ogrady@fastbuck. home (617) 495-6000
philg@mit.edu Philip Greenspun philg@mit.edu work (617) 253-8574
ogrady@fastbuck. Michael O'Grady philg@mit.edu work (617) 253-8574
6 rows selected.
mailing_list
table and three in the phone_numbers
table. Yet here we have six rows back. This is how joins work. They give you the Cartesian product
of the two tables. Each row of one table is paired with all the rows of
the other table in turn. So if you join an N-row table with an M-row
table, you get back a result with N*M rows. In real databases, N and M
can be up in the millions so it is worth being a little more specific
as to which rows you want: Probably more like what you had in mind. Refining your SQL statements in this manner can sometimes be more exciting. For example, let's say that you want to get rid of Philip Greenspun's phone numbers but aren't sure of the exact syntax.
select *
from mailing_list, phone_numbers
where mailing_list.email = phone_numbers.email;
EMAIL NAME EMAIL TYPE NUMBER
---------------- ---------------- ---------------- ------ --------------
ogrady@fastbuck. Michael O'Grady ogrady@fastbuck. work (800) 555-1212
ogrady@fastbuck. Michael O'Grady ogrady@fastbuck. home (617) 495-6000
philg@mit.edu Philip Greenspun philg@mit.edu work (617) 253-8574
3 rows selected.
Oops. Yes, this does actually delete all the rows in the table. You probably wish you'd typed
SQL> delete from phone_numbers;
3 rows deleted.
but it is too late now. Therefore be careful before executing "delete".
delete from phone_numbers where email = 'philg@mit.edu';
There is one more fundamental SQL statement to learn. Suppose that Philip moves to Hollywood to realize his long-standing dream of becoming a major motion picture producer. Clearly a change of name is in order, though he'd be reluctant to give up the e-mail address he's had since 1976. Here's the SQL:
SQL> update mailing_list set name = 'Phil-baby Greenspun' where email = 'philg@mit.edu';
1 row updated.
SQL> select * from mailing_list;
EMAIL NAME
-------------------- --------------------
philg@mit.edu Phil-baby Greenspun
ogrady@fastbuck.com Michael O'Grady
2 rows selected.
As with DELETE, don't play around with UPDATE statements unless you have a WHERE clause at the end.
---
based on SQL for Web Nerds