test-doc - Data modeling

I OpenACS For Everyone
- I.1 High level information: What is OpenACS?
  - I.1.1 Overview
  - I.1.2 OpenACS Release Notes
- I.2 OpenACS: robust web development framework
  - I.2.1 Introduction
  - I.2.2 Basic infrastructure
  - I.2.3 Advanced infrastructure
  - I.2.4 Domain level tools
II Administrator's Guide
- II.2 Installation Overview
  - II.2.1 Basic Steps
  - II.2.2 Prerequisite Software
- II.3 Complete Installation
  - II.3.1 Install a Unix-like system and supporting software
  - II.3.2 Install Oracle 10g XE on debian
    - II.3.2.1 Install Oracle 8.1.7
  - II.3.3 Install PostgreSQL
  - II.3.4 Install AOLserver 4
  - II.3.5 Quick Install of OpenACS
    - II.3.5.1 Complex Install OpenACS 5.3
  - II.3.6 OpenACS Installation Guide for Windows2000
  - II.3.7 OpenACS Installation Guide for Mac OS X
- II.4 Configuring a new OpenACS Site
  - II.4.1 Installing OpenACS packages
  - II.4.2 Mounting OpenACS packages
  - II.4.3 Configuring an OpenACS package
  - II.4.4 Setting Permissions on an OpenACS package
  - II.4.5 How Do I?
  - II.4.6 Configure OpenACS look and feel with templates
- II.5 Upgrading
  - II.5.1 Overview
  - II.5.2 Upgrading 4.5 or higher to 4.6.3
  - II.5.3 Upgrading OpenACS 4.6.3 to 5.0
  - II.5.4 Upgrading an OpenACS 5.0.0 or greater installation
  - II.5.5 Upgrading the OpenACS files
  - II.5.6 Upgrading Platform components
- II.6 Production Environments
  - II.6.1 Starting and Stopping an OpenACS instance.
  - II.6.2 AOLserver keepalive with inittab
  - II.6.3 Running multiple services on one machine
  - II.6.4 High Availability/High Performance Configurations
  - II.6.5 Staged Deployment for Production Networks
  - II.6.6 Installing SSL Support for an OpenACS service
  - II.6.7 Set up Log Analysis Reports
  - II.6.8 External uptime validation
  - II.6.9 Diagnosing Performance Problems
- II.7 Database Management
  - II.7.1 Running a PostgreSQL database on another server
  - II.7.2 Deleting a tablespace
  - II.7.3 Vacuum Postgres nightly
- II.8 Backup and Recovery
  - II.8.1 Backup Strategy
  - II.8.2 Manual backup and recovery
  - II.8.3 Automated Backup
  - II.8.4 Using CVS for backup-recovery
- II.A Install Red Hat 8/9
- II.B Install additional supporting software
  - II.B.1 Unpack the OpenACS tarball
  - II.B.2 Initialize CVS (OPTIONAL)
  - II.B.3 Add PSGML commands to emacs init file (OPTIONAL)
  - II.B.4 Install Daemontools (OPTIONAL)
  - II.B.5 Install qmail (OPTIONAL)
  - II.B.6 Install Analog web file analyzer
  - II.B.7 Install nspam
  - II.B.8 Install Full Text Search
  - II.B.9 Install Full Text Search using Tsearch2
  - II.B.10 Install Full Text Search using OpenFTS (deprecated see tsearch2)
  - II.B.11 Install nsopenssl
  - II.B.12 Install tclwebtest.
  - II.B.13 Install PHP for use in AOLserver
  - II.B.14 Install Squirrelmail for use as a webmail system for OpenACS
  - II.B.15 Install PAM Radius for use as external authentication
  - II.B.16 Install LDAP for use as external authentication
  - II.B.17 Install AOLserver 3.3oacs1
- II.C Credits
  - II.C.1 Where did this document come from?
  - II.C.2 Linux Install Guides
  - II.C.3 Security Information
  - II.C.4 Resources
III For OpenACS Package Developers
- III.9 Development Tutorial
  - III.9.1 Creating an Application Package
  - III.9.2 Setting Up Database Objects
  - III.9.3 Creating Web Pages
  - III.9.4 Debugging and Automated Testing
- III.10 Advanced Topics
  - III.10.1 Write the Requirements and Design Specs
  - III.10.2 Add the new package to CVS
  - III.10.3 OpenACS Edit This Page Templates
  - III.10.4 Adding Comments
  - III.10.5 Admin Pages
  - III.10.6 Categories
  - III.10.7 Profile your code
  - III.10.8 Prepare the package for distribution.
  - III.10.9 Distributing upgrades of your package
  - III.10.10 Notifications
  - III.10.11 Hierarchical data
  - III.10.12 Using .vuh files for pretty urls
  - III.10.13 Laying out a page with CSS instead of tables
  - III.10.14 Sending HTML email from your application
  - III.10.15 Basic Caching
  - III.10.16 Scheduled Procedures
  - III.10.17 Enabling WYSIWYG
  - III.10.18 Adding in parameters for your package
  - III.10.19 Writing upgrade scripts
  - III.10.20 Connect to a second database
  - III.10.21 Future Topics
- III.11 Development Reference
  - III.11.1 OpenACS Packages
  - III.11.2 OpenACS Data Models and the Object System
  - III.11.3 The Request Processor
  - III.11.4 The OpenACS Database Access API
  - III.11.5 Using Templates in OpenACS
  - III.11.6 Groups, Context, Permissions
  - III.11.7 Writing OpenACS Application Pages
  - III.11.8 Parties in OpenACS
  - III.11.9 OpenACS Permissions Tediously Explained
  - III.11.10 Object Identity
  - III.11.11 Programming with AOLserver
  - III.11.12 Using Form Builder: building html forms dynamically
- III.12 Engineering Standards
  - III.12.1 OpenACS Style Guide
  - III.12.2 Release Version Numbering
  - III.12.3 Constraint naming standard
  - III.12.4 ACS File Naming and Formatting Standards
  - III.12.5 PL/SQL Standards
  - III.12.6 Variables
  - III.12.7 Automated Testing
- III.13 CVS Guidelines
  - III.13.1 Using CVS with OpenACS
  - III.13.2 OpenACS CVS Concepts
  - III.13.3 Contributing code back to OpenACS
  - III.13.4 Additional Resources for CVS
- III.14 Documentation Standards
  - III.14.1 OpenACS Documentation Guide
  - III.14.2 Using PSGML mode in Emacs
  - III.14.3 Using nXML mode in Emacs
  - III.14.4 Detailed Design Documentation Template
  - III.14.5 System/Application Requirements Template
- III.15 TCLWebtest
  - III.15.1 API test
  - III.15.2 Webtest
- III.16 Internationalization
  - III.16.1 Internationalization and Localization Overview
  - III.16.2 How Internationalization/Localization works in OpenACS
  - III.16.4 Design Notes
  - III.16.5 Translator's Guide
- III.D Using CVS with an OpenACS Site
IV For OpenACS Platform Developers
- IV.17 Kernel Documentation
  - IV.17.1 Overview
  - IV.17.2 Object Model Requirements
  - IV.17.3 Object Model Design
  - IV.17.4 Permissions Requirements
  - IV.17.5 Permissions Design
  - IV.17.6 Groups Requirements
  - IV.17.7 Groups Design
  - IV.17.8 Subsites Requirements
  - IV.17.9 Subsites Design Document
  - IV.17.10 Package Manager Requirements
  - IV.17.11 Package Manager Design
  - IV.17.12 Database Access API
  - IV.17.13 OpenACS Internationalization Requirements
  - IV.17.14 Security Requirements
  - IV.17.15 Security Design
  - IV.17.16 Security Notes
  - IV.17.17 Request Processor Requirements
  - IV.17.18 Request Processor Design
  - IV.17.19 Documenting Tcl Files: Page Contracts and Libraries
  - IV.17.20 Bootstrapping OpenACS
  - IV.17.21 External Authentication Requirements
- IV.18 Releasing OpenACS
  - IV.18.1 OpenACS Core and .LRN
  - IV.18.2 How to Update the OpenACS.org repository
  - IV.18.3 How to package and release an OpenACS Package
  - IV.18.4 How to Update the translations
V Tcl for Web Nerds
- V.1 Tcl for Web Nerds Introduction
- V.2 Basic String Operations
- V.3 List Operations
- V.4 Pattern matching
- V.5 Array Operations
- V.6 Numbers
- V.7 Control Structure
- V.8 Scope, Upvar and Uplevel
- V.9 File Operations
- V.10 Eval
- V.11 Exec
- V.12 Tcl for Web Use
- V.13 OpenACS conventions for TCL
- V.14 Solutions
VI SQL for Web Nerds
- VI.1 SQL Tutorial
  - VI.1.1 SQL Tutorial
  - VI.1.2 Answers
- VI.2 SQL for Web Nerds Introduction
- VI.3 Data modeling
  - VI.3.1 The Discussion Forum -- philg's personal odyssey
  - VI.3.2 Data Types (Oracle)
  - VI.3.4 Tables
  - VI.3.5 Constraints
- VI.4 Simple queries
- VI.5 More complex queries
- VI.6 Transactions
- VI.7 Triggers
- VI.8 Views
- VI.9 Style
- VI.10 Escaping to the procedural world
- VI.11 Trees

94.29%

· Index

VI.3 Data modeling

Data modeling is the hardest and most important activity in the RDBMS world. If you get the data model wrong, your application might not do what users need, it might be unreliable, it might fill up the database with garbage. Why then do we start a SQL tutorial with the most challenging part of the job? Because you can't do queries, inserts, and updates until you've defined some tables. And defining tables is data modeling.

When data modeling, you are telling the RDBMS the following:

what elements of the data you will store
how large each element can be
what kind of information each element can contain
what elements may be left blank
which elements are constrained to a fixed range
whether and how various tables are to be linked

Three-Valued Logic

Programmers in most computer languages are familiar with Boolean logic. A variable may be either true or false. Pervading SQL, however, is the alien idea of three-valued logic. A column can be true, false, or NULL. When building the data model you must affirmatively decide whether a NULL value will be permitted for a column and, if so, what it means.

For example, consider a table for recording user-submitted comments to a Web site. The publisher has made the following stipulations:

comments won't go live until approved by an editor
the admin pages will present editors with all comments that are pending approval, i.e., have been submitted but neither approved nor disapproved by an editor already

Here's the data model:


create table user_submitted_comments (
	comment_id		integer primary key,
	user_id			not null references users,
	submission_time		date default sysdate not null,
	ip_address		varchar(50) not null,
	content			clob,
	approved_p		char(1) check(approved_p in ('t','f'))
);

Implicit in this model is the assumption that approved_p can be NULL and that, if not explicitly set during the INSERT, that is what it will default to. What about the check constraint? It would seem to restrict approved_p to values of "t" or "f". NULL, however, is a special value and if we wanted to prevent approved_p from taking on NULL we'd have to add an explicit not null constraint.

How do NULLs work with queries? Let's fill user_submitted_comments with some sample data and see:


insert into user_submitted_comments
(comment_id, user_id, ip_address, content)
values
(1, 23069, '18.30.2.68', 'This article reminds me of Hemingway');

Table created.

SQL> select first_names, last_name, content, user_submitted_comments.approved_p
from user_submitted_comments, users 
where user_submitted_comments.user_id = users.user_id;

FIRST_NAMES  LAST_NAME	     CONTENT				  APPROVED_P
------------ --------------- ------------------------------------ ------------
Philip	     Greenspun	     This article reminds me of Hemingway

We've successfully JOINed the user_submitted_comments and users table to get both the comment content and the name of the user who submitted it. Notice that in the select list we had to explicitly request user_submitted_comments.approved_p. This is because the users table also has an approved_p column.

When we inserted the comment row we did not specify a value for the approved_p column. Thus we expect that the value would be NULL and in fact that's what it seems to be. Oracle's SQL*Plus application indicates a NULL value with white space.

For the administration page, we'll want to show only those comments where the approved_p column is NULL:


SQL> select first_names, last_name, content, user_submitted_comments.approved_p
from user_submitted_comments, users 
where user_submitted_comments.user_id = users.user_id
and user_submitted_comments.approved_p = NULL;

no rows selected

"No rows selected"? That's odd. We know for a fact that we have one row in the comments table and that is approved_p column is set to NULL. How to debug the query? The first thing to do is simplify by removing the JOIN:


SQL> select * from user_submitted_comments where approved_p = NULL;

no rows selected

What is happening here is that any expression involving NULL evaluates to NULL, including one that effectively looks like "NULL = NULL". The WHERE clause is looking for expressions that evaluate to true. What you need to use is the special test IS NULL:


SQL> select * from user_submitted_comments where approved_p is NULL;

COMMENT_ID    USER_ID SUBMISSION_T IP_ADDRESS
---------- ---------- ------------ ----------
CONTENT 			     APPROVED_P
------------------------------------ ------------
	 1	23069 2000-05-27   18.30.2.68
This article reminds me of Hemingway

An adage among SQL programmers is that the only time you can use "= NULL" is in an UPDATE statement (to set a column's value to NULL). It never makes sense to use "= NULL" in a WHERE clause.

The bottom line is that as a data modeler you will have to decide which columns can be NULL and what that value will mean.

Back to the Mailing List

Let's return to the mailing list data model from the introduction:


create table mailing_list (
        email           varchar(100) not null primary key,
        name            varchar(100)
);

create table phone_numbers (
        email           varchar(100) not null references mailing_list,
        number_type     varchar(15) check (number_type in ('work','home','cell','beeper')),
        phone_number    varchar(20) not null
);

This data model locks you into some realities:

You will not be sending out any physical New Year's cards to folks on your mailing list; you don't have any way to store their addresses.
You will not be sending out any electronic mail to folks who work at companies with elaborate Lotus Notes configurations; sometimes Lotus Notes results in email addresses that are longer than 100 characters.
You are running the risk of filling the database with garbage since you have not constrained phone numbers in any way. American users could add or delete digits by mistake. International users could mistype country codes.
You are running the risk of not being able to serve rich people because the number_type column may be too constrained. Suppose William H. Gates the Third wishes to record some extra phone numbers with types of "boat", "ranch", "island", and "private_jet". The check (number_type in ('work','home','cell','beeper')) statement prevents Mr. Gates from doing this.
You run the risk of having records in the database for people whose name you don't know, since the name column of mailing_list is free to be NULL.
Changing a user's email address won't be the simplest possible operation. You're using email as a key in two tables and therefore will have to update both tables. The references mailing_list keeps you from making the mistake of only updating mailing_list and leaving orphaned rows in phone_numbers. But if users changed their email addresses frequently, you might not want to do things this way.
Since you've no provision for storing a password or any other means of authentication, if you allow users to update their information, you run a minor risk of allowing a malicious change. (The risk isn't as great as it seems because you probably won't be publishing the complete mailing list; an attacker would have to guess the names of people on your mailing list.)

These aren't necessarily bad realities in which to be locked. However, a good data modeler recognizes that every line of code in the .sql file has profound implications for the Web service.

To get some more information on how a simple datamodel for a Discussion Forum can evolve, read en:sql-wn-data_modeling-philip

Representing Web Site Core Content

Free-for-all Internet discussions can often be useful and occasionally are compelling, but the anchor of a good Web site is usually a set of carefully authored extended documents. Historically these have tended to be stored in the Unix file system and they don't change too often. Hence I refer to them as static pages. Examples of static pages on the photo.net server include this book chapter, the tutorial on light for photographers at http://www.photo.net/making-photographs/light.

We have some big goals to consider. We want the data in the database to

help community experts figure out which articles need revision and which new articles would be most valued by the community at large.
help contributors work together on a draft article or a new version of an old article.
collect and organize reader comments and discussion, both for presentation to other readers but also to assist authors in keeping content up-to-date.
collect and organize reader-submitted suggestions of related content out on the wider Internet (i.e., links).
help point readers to new or new-to-them content that might interest them, based on what they've read before or based on what kind of content they've said is interesting.

The big goals lead to some more concrete objectives:

We will need a table that holds the static pages themselves.
Since there are potentially many comments per page, we need a separate table to hold the user-submitted comments.
Since there are potentially many related links per page, we need a separate table to hold the user-submitted links.
Since there are potentially many authors for one page, we need a separate table to register the author-page many-to-one relation.
Considering the "help point readers to stuff that will interest them" objective, it seems that we need to store the category or categories under which a page falls. Since there are potentially many categories for one page, we need a separate table to hold the mapping between pages and categories.


create table static_pages (
	page_id		integer not null primary key,
	url_stub	varchar(400) not null unique,
	original_author	integer references users(user_id),
	page_title	varchar(4000),
	page_body	clob,
	obsolete_p	char(1) default 'f' check (obsolete_p in ('t','f')),
	members_only_p	char(1) default 'f' check (members_only_p in ('t','f')),
	price		number,
	copyright_info	varchar(4000),
	accept_comments_p	char(1) default 't' check (accept_comments_p in ('t','f')),
	accept_links_p		char(1) default 't' check (accept_links_p in ('t','f')),
	last_updated		date,
	-- used to prevent minor changes from looking like new content
	publish_date		date
);

create table static_page_authors (
	page_id		integer not null references static_pages,
	user_id		integer not null references users,
	notify_p	char(1) default 't' check (notify_p in ('t','f')),
	unique(page_id,user_id)
);

Note that we use a generated integer page_id key for this table. We could key the table by the url_stub (filename), but that would make it very difficult to reorganize files in the Unix file system (something that should actually happen very seldom on a Web server; it breaks links from foreign sites).

How to generate these unique integer keys when you have to insert a new row into static_pages? You could

lock the table
find the maximum page_id so far
add one to create a new unique page_id
insert the row
commit the transaction (releases the table lock)

Much better is to use Oracle's built-in sequence generation facility:


create sequence page_id_sequence start with 1;

Then we can get new page IDs by using page_id_sequence.nextval in INSERT statements (see the Transactions chapter for a fuller discussion of sequences).

Reference

Here is a summary of the data modeling tools available to you in Oracle, each hyperlinked to the Oracle documentation. This reference section covers the following:

data types en:sql-wn-data_modeling_data_types
statements for creating, altering, and dropping tables en:sql-wn-data_modeling_tables
constraints en:sql-wn-data_modeling_contraints

---

based on SQL for Web Nerds

Categories: developer (Audience)