test-doc - The Discussion Forum -- philg's personal odyssey

I OpenACS For Everyone
- I.1 High level information: What is OpenACS?
  - I.1.1 Overview
  - I.1.2 OpenACS Release Notes
- I.2 OpenACS: robust web development framework
  - I.2.1 Introduction
  - I.2.2 Basic infrastructure
  - I.2.3 Advanced infrastructure
  - I.2.4 Domain level tools
II Administrator's Guide
- II.2 Installation Overview
  - II.2.1 Basic Steps
  - II.2.2 Prerequisite Software
- II.3 Complete Installation
  - II.3.1 Install a Unix-like system and supporting software
  - II.3.2 Install Oracle 10g XE on debian
    - II.3.2.1 Install Oracle 8.1.7
  - II.3.3 Install PostgreSQL
  - II.3.4 Install AOLserver 4
  - II.3.5 Quick Install of OpenACS
    - II.3.5.1 Complex Install OpenACS 5.3
  - II.3.6 OpenACS Installation Guide for Windows2000
  - II.3.7 OpenACS Installation Guide for Mac OS X
- II.4 Configuring a new OpenACS Site
  - II.4.1 Installing OpenACS packages
  - II.4.2 Mounting OpenACS packages
  - II.4.3 Configuring an OpenACS package
  - II.4.4 Setting Permissions on an OpenACS package
  - II.4.5 How Do I?
  - II.4.6 Configure OpenACS look and feel with templates
- II.5 Upgrading
  - II.5.1 Overview
  - II.5.2 Upgrading 4.5 or higher to 4.6.3
  - II.5.3 Upgrading OpenACS 4.6.3 to 5.0
  - II.5.4 Upgrading an OpenACS 5.0.0 or greater installation
  - II.5.5 Upgrading the OpenACS files
  - II.5.6 Upgrading Platform components
- II.6 Production Environments
  - II.6.1 Starting and Stopping an OpenACS instance.
  - II.6.2 AOLserver keepalive with inittab
  - II.6.3 Running multiple services on one machine
  - II.6.4 High Availability/High Performance Configurations
  - II.6.5 Staged Deployment for Production Networks
  - II.6.6 Installing SSL Support for an OpenACS service
  - II.6.7 Set up Log Analysis Reports
  - II.6.8 External uptime validation
  - II.6.9 Diagnosing Performance Problems
- II.7 Database Management
  - II.7.1 Running a PostgreSQL database on another server
  - II.7.2 Deleting a tablespace
  - II.7.3 Vacuum Postgres nightly
- II.8 Backup and Recovery
  - II.8.1 Backup Strategy
  - II.8.2 Manual backup and recovery
  - II.8.3 Automated Backup
  - II.8.4 Using CVS for backup-recovery
- II.A Install Red Hat 8/9
- II.B Install additional supporting software
  - II.B.1 Unpack the OpenACS tarball
  - II.B.2 Initialize CVS (OPTIONAL)
  - II.B.3 Add PSGML commands to emacs init file (OPTIONAL)
  - II.B.4 Install Daemontools (OPTIONAL)
  - II.B.5 Install qmail (OPTIONAL)
  - II.B.6 Install Analog web file analyzer
  - II.B.7 Install nspam
  - II.B.8 Install Full Text Search
  - II.B.9 Install Full Text Search using Tsearch2
  - II.B.10 Install Full Text Search using OpenFTS (deprecated see tsearch2)
  - II.B.11 Install nsopenssl
  - II.B.12 Install tclwebtest.
  - II.B.13 Install PHP for use in AOLserver
  - II.B.14 Install Squirrelmail for use as a webmail system for OpenACS
  - II.B.15 Install PAM Radius for use as external authentication
  - II.B.16 Install LDAP for use as external authentication
  - II.B.17 Install AOLserver 3.3oacs1
- II.C Credits
  - II.C.1 Where did this document come from?
  - II.C.2 Linux Install Guides
  - II.C.3 Security Information
  - II.C.4 Resources
III For OpenACS Package Developers
- III.9 Development Tutorial
  - III.9.1 Creating an Application Package
  - III.9.2 Setting Up Database Objects
  - III.9.3 Creating Web Pages
  - III.9.4 Debugging and Automated Testing
- III.10 Advanced Topics
  - III.10.1 Write the Requirements and Design Specs
  - III.10.2 Add the new package to CVS
  - III.10.3 OpenACS Edit This Page Templates
  - III.10.4 Adding Comments
  - III.10.5 Admin Pages
  - III.10.6 Categories
  - III.10.7 Profile your code
  - III.10.8 Prepare the package for distribution.
  - III.10.9 Distributing upgrades of your package
  - III.10.10 Notifications
  - III.10.11 Hierarchical data
  - III.10.12 Using .vuh files for pretty urls
  - III.10.13 Laying out a page with CSS instead of tables
  - III.10.14 Sending HTML email from your application
  - III.10.15 Basic Caching
  - III.10.16 Scheduled Procedures
  - III.10.17 Enabling WYSIWYG
  - III.10.18 Adding in parameters for your package
  - III.10.19 Writing upgrade scripts
  - III.10.20 Connect to a second database
  - III.10.21 Future Topics
- III.11 Development Reference
  - III.11.1 OpenACS Packages
  - III.11.2 OpenACS Data Models and the Object System
  - III.11.3 The Request Processor
  - III.11.4 The OpenACS Database Access API
  - III.11.5 Using Templates in OpenACS
  - III.11.6 Groups, Context, Permissions
  - III.11.7 Writing OpenACS Application Pages
  - III.11.8 Parties in OpenACS
  - III.11.9 OpenACS Permissions Tediously Explained
  - III.11.10 Object Identity
  - III.11.11 Programming with AOLserver
  - III.11.12 Using Form Builder: building html forms dynamically
- III.12 Engineering Standards
  - III.12.1 OpenACS Style Guide
  - III.12.2 Release Version Numbering
  - III.12.3 Constraint naming standard
  - III.12.4 ACS File Naming and Formatting Standards
  - III.12.5 PL/SQL Standards
  - III.12.6 Variables
  - III.12.7 Automated Testing
- III.13 CVS Guidelines
  - III.13.1 Using CVS with OpenACS
  - III.13.2 OpenACS CVS Concepts
  - III.13.3 Contributing code back to OpenACS
  - III.13.4 Additional Resources for CVS
- III.14 Documentation Standards
  - III.14.1 OpenACS Documentation Guide
  - III.14.2 Using PSGML mode in Emacs
  - III.14.3 Using nXML mode in Emacs
  - III.14.4 Detailed Design Documentation Template
  - III.14.5 System/Application Requirements Template
- III.15 TCLWebtest
  - III.15.1 API test
  - III.15.2 Webtest
- III.16 Internationalization
  - III.16.1 Internationalization and Localization Overview
  - III.16.2 How Internationalization/Localization works in OpenACS
  - III.16.4 Design Notes
  - III.16.5 Translator's Guide
- III.D Using CVS with an OpenACS Site
IV For OpenACS Platform Developers
- IV.17 Kernel Documentation
  - IV.17.1 Overview
  - IV.17.2 Object Model Requirements
  - IV.17.3 Object Model Design
  - IV.17.4 Permissions Requirements
  - IV.17.5 Permissions Design
  - IV.17.6 Groups Requirements
  - IV.17.7 Groups Design
  - IV.17.8 Subsites Requirements
  - IV.17.9 Subsites Design Document
  - IV.17.10 Package Manager Requirements
  - IV.17.11 Package Manager Design
  - IV.17.12 Database Access API
  - IV.17.13 OpenACS Internationalization Requirements
  - IV.17.14 Security Requirements
  - IV.17.15 Security Design
  - IV.17.16 Security Notes
  - IV.17.17 Request Processor Requirements
  - IV.17.18 Request Processor Design
  - IV.17.19 Documenting Tcl Files: Page Contracts and Libraries
  - IV.17.20 Bootstrapping OpenACS
  - IV.17.21 External Authentication Requirements
- IV.18 Releasing OpenACS
  - IV.18.1 OpenACS Core and .LRN
  - IV.18.2 How to Update the OpenACS.org repository
  - IV.18.3 How to package and release an OpenACS Package
  - IV.18.4 How to Update the translations
V Tcl for Web Nerds
- V.1 Tcl for Web Nerds Introduction
- V.2 Basic String Operations
- V.3 List Operations
- V.4 Pattern matching
- V.5 Array Operations
- V.6 Numbers
- V.7 Control Structure
- V.8 Scope, Upvar and Uplevel
- V.9 File Operations
- V.10 Eval
- V.11 Exec
- V.12 Tcl for Web Use
- V.13 OpenACS conventions for TCL
- V.14 Solutions
VI SQL for Web Nerds
- VI.1 SQL Tutorial
  - VI.1.1 SQL Tutorial
  - VI.1.2 Answers
- VI.2 SQL for Web Nerds Introduction
- VI.3 Data modeling
  - VI.3.1 The Discussion Forum -- philg's personal odyssey
  - VI.3.2 Data Types (Oracle)
  - VI.3.4 Tables
  - VI.3.5 Constraints
- VI.4 Simple queries
- VI.5 More complex queries
- VI.6 Transactions
- VI.7 Triggers
- VI.8 Views
- VI.9 Style
- VI.10 Escaping to the procedural world
- VI.11 Trees

94.76%

· Index

VI.3.1 The Discussion Forum -- philg's personal odyssey

Back in 1995, I built a threaded discussion forum, described ad nauseum in http://philip.greenspun.com/wtr/dead-trees/53013.htm. Here's how I stored the postings:


create table bboard (
        msg_id          char(6) not null primary key,
        refers_to       char(6),
        email           varchar(200),
        name            varchar(200),
        one_line        varchar(700),
        message         clob,
        notify          char(1) default 'f' check (notify in ('t','f')),
        posting_time    date,
        sort_key        varchar(600)
);

German order reigns inside the system itself: messages are uniquely keyed with msg_id, refer to each other (i.e., say "I'm a response to msg X") with refers_to, and a thread can be displayed conveniently by using the sort_key column.

Italian chaos is permitted in the email and name columns; users could remain anonymous, masquerade as "president@whitehouse.gov" or give any name.

This seemed like a good idea when I built the system. I was concerned that it work reliably. I didn't care whether or not users put in bogus content; the admin pages made it really easy to remove such postings and, in any case, if someone had something interesting to say but needed to remain anonymous, why should the system reject their posting?

One hundred thousand postings later, as the moderator of the photo.net Q&A forum, I began to see the dimensions of my data modeling mistakes.

First, anonymous postings and fake email addresses didn't come from Microsoft employees revealing the dark truth about their evil bosses. They came from complete losers trying and failing to be funny or wishing to humiliate other readers. Some fake addresses came from people scared by the rising tide of spam email (not a serious problem back in 1995).

Second, I didn't realize how the combination of my email alert systems, fake email addresses, and Unix mailers would result in my personal mailbox filling up with messages that couldn't be delivered to "asdf@asdf.com" or "duh@duh.net".

Although the solution involved changing some Web scripts, fundamentally the fix was add a column to store the IP address from which a post was made:


alter table bboard add (originating_ip	varchar(16));

Keeping these data enabled me to see that most of the anonymous posters were people who'd been using the forum for some time, typically from the same IP address. I just sent them mail and asked them to stop, explaining the problem with bounced email.

After four years of operating the photo.net community, it became apparent that we needed ways to

display site history for users who had changed their email addresses
discourage problem users from burdening the moderators and the community
carefully tie together user-contributed content in the various subsystems of photo.net

The solution was obvious to any experienced database nerd: a canonical users table and then content tables that reference it. Here's a simplified version of the data model, taken from a toolkit for building online communities, describe in http://philip.greenspun.com/panda/community:


create table users (
	user_id			integer not null primary key,
	first_names		varchar(100) not null,
	last_name		varchar(100) not null,
	email			varchar(100) not null unique,
        ..
);

create table bboard (
	msg_id		char(6) not null primary key,
	refers_to	char(6),
	topic		varchar(100) not null references bboard_topics,
	category	varchar(200),	-- only used for categorized Q&A forums
	originating_ip	varchar(16),	-- stored as string, separated by periods
	user_id		integer not null references users,
	one_line	varchar(700),
	message		clob,
	-- html_p - is the message in html or not
	html_p		char(1) default 'f' check (html_p in ('t','f')),
        ...
);

create table classified_ads (
	classified_ad_id	integer not null primary key,
	user_id			integer not null references users,
	...
);

Note that a contributor's name and email address no longer appear in the bboard table. That doesn't mean we don't know who posted a message. In fact, this data model can't even represent an anonymous posting: user_id integer not null references users requires that each posting be associated with a user ID and that there actually be a row in the users table with that ID.

First, let's talk about how much fun it is to move a live-on-the-Web 600,000 hit/day service from one data model to another. In this case, note that the original bboard data model had a single name column. The community system has separate columns for first and last names. A conversion script can easily split up "Joe Smith" but what is it to do with William Henry Gates III?

How do we copy over anonymous postings? Remember that Oracle is not flexible or intelligent. We said that we wanted every row in the bboard table to reference a row in the users table. Oracle will abort any transaction that would result in a violation of this integrity constraint. So we either have to drop all those anonymous postings (and any non-anonymous postings that refer to them) or we have to create a user called "Anonymous" and assign all the anonymous postings to that person. The technical term for this kind of solution is kludge.

A more difficult problem than anonymous postings is presented by long-time users who have difficulty typing and or keeping a job. Consider a user who has identified himself as

Joe Smith; jsmith@ibm.com
Jo Smith; jsmith@ibm.com (typo in name)
Joseph Smith; jsmth@ibm.com (typo in email)
Joe Smith; cantuseworkaddr@hotmail.com (new IBM policy)
Joe Smith-Jones; joe_smithjones@hp.com (got married, changed name, changed jobs)
Joe Smith-Jones; jsmith@somedivision.hp.com (valid but not canonical corporate email address)
Josephina Smith; jsmith@somedivision.hp.com (sex change; divorce)
Josephina Smith; josephina_smith@hp.com (new corporate address)
Siddhartha Bodhisattva; josephina_smith@hp.com (change of philosophy)
Siddhartha Bodhisattva; thinkwaitfast@hotmail.com (traveling for awhile to find enlightenment)

Contemporary community members all recognize these postings as coming from the same person but it would be very challenging even to build a good semi-automated means of merging postings from this person into one user record.

Once we've copied everything into this new normalized data model, notice that we can't dig ourselves into the same hole again. If a user has contributed 1000 postings, we don't have 1000 different records of that person's name and email address. If a user changes jobs, we need only update one column in one row in one table.

The html_p column in the new data model is worth mentioning. In 1995, I didn't understand the problems of user-submitted data. Some users will submit plain text, which seems simple, but in fact you can't just spit this out as HTML. If user A typed < or > characters, they might get swallowed by user B's Web browser. Does this matter? Consider that "<g>" is interpreted in various online circles as an abbreviation for "grin" but by Netscape Navigator as an unrecognized (and therefore ignore) HTML tag. Compare the meaning of

"We shouldn't think it unfair that Bill Gates has more wealth than the 100 million poorest Americans combined. After all, he invented the personal computer, the graphical user interface, and the Internet."

with

"We shouldn't think it unfair that Bill Gates has more wealth than the 100 million poorest Americans combined. After all, he invented the personal computer, the graphical user interface, and the Internet. <g>"

It would have been easy enough for me to make sure that such characters never got interpreted as markup. In fact, with AOLserver one can do it with a single call to the built-in procedure ns_quotehtml. However, consider the case where a nerd posts some HTML. Other users would then see

"For more examples of my brilliant thinking and modesty, check out <a href="http://philip.greenspun.com/">my home page</a>."

I discovered that the only real solution is to ask the user whether the submission is an HTML fragment or plain text, show the user an approval page where the content may be previewed, and then remember what the user told us in an html_p column in the database.

Is this data model perfect? Permanent? Absolutely. It will last for at least... Whoa! Wait a minute. I didn't know that Dave Clark was replacing his original Internet Protocol, which the world has been running since around 1980, with IPv6 (http://www.faqs.org/rfcs/rfc2460.html). In the near future, we'll have IP addresses that are 128 bits long. That's 16 bytes, each of which takes two hex characters to represent. So we need 32 characters plus at least 7 more for periods that separate the hex digits. We might also need a couple of characters in front to say "this is a hex representation". Thus our brand new data model in fact has a crippling deficiency. How easy is it to fix? In Oracle:


alter table bboard modify (originating_ip varchar(50));

You won't always get off this easy. Oracle won't let you shrink a column from a maximum of 50 characters to 16, even if no row has a value longer than 16 characters. Oracle also makes it tough to add a column that is constrained not null.

References to this Page: en:sql-wn-data_modeling Categories: developer (Audience)