test-doc - Weblog Page

Showing 171 - 180 of 230 Postings (summary)

SQL Tutorial

Created by Anett Szabo, last modified by Anett Szabo 26 Jul 2007, at 01:48 PM

What is SQL?

SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to communicate with a database. According to ANSI (American National Standards Institute), it is the standard language for relational database management systems. SQL statements are used to perform tasks such as update data on a database, or retrieve data from a database. Some common relational database management systems that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, Ingres, etc. Although most database systems use SQL, most of them also have their own additional proprietary extensions that are usually only used on their system. However, the standard SQL commands such as "Select", "Insert", "Update", "Delete", "Create", and "Drop" can be used to accomplish almost everything that one needs to do with a database. This tutorial will provide you with the instruction on the basics of each of these commands as well as allow you to put them to practice using the SQL Interpreter.

Table Basics

A relational database system contains one or more objects called tables. The data or information for the database are stored in these tables. Tables are uniquely identified by their names and are comprised of columns and rows. Columns contain the column name, data type, and any other attributes for the column. Rows contain the records or data for the columns. Here is a sample table called "weather".

city, state, high, and low are the columns. The rows contain the data for this table:

Weather
city	state	high	low
Phoenix	Arizona	105	90
Tucson	Arizona	101	92
Flagstaff	Arizona	88	69
San Diego	California	77	60
Albuquerque	New Mexico	80	72

Selecting Data

The select statement is used to query the database and retrieve selected data that match the criteria that you specify. Here is the format of a simple select statement:

select "column1"
  [,"column2",etc] 
  from "tablename"
  [where "condition"];
  [] = optional

The column names that follow the select keyword determine which columns will be returned in the results. You can select as many column names that you'd like, or you can use a "*" to select all columns.

The table name that follows the keyword from specifies the table that will be queried to retrieve the desired results.

The where clause (optional) specifies which data values or rows will be returned or displayed, based on the criteria described after the keyword where.

Conditional selections used in the where clause:

=	Equal
>	Greater than
<	Less than
>=	Greater than or equal
<=	Less than or equal
<>	Not equal to
LIKE	*See note below

The LIKE pattern matching operator can also be used in the conditional selection of the where clause. Like is a very powerful operator that allows you to select only rows that are "like" what you specify. The percent sign "%" can be used as a wild card to match any possible character that might appear before or after the characters specified. For example:

Sample Table: empinfo
first	last	id	age	city	state
John	Jones	99980	45	Payson	Arizona
Mary	Jones	99982	25	Payson	Arizona
Eric	Edwards	88232	32	San Diego	California
Mary Ann	Edwards	88233	32	Phoenix	Arizona
Ginger	Howell	98002	42	Cottonwood	Arizona
Sebastian	Smith	92001	23	Gila Bend	Arizona
Gus	Gray	22322	35	Bagdad	Arizona
Mary Ann	May	32326	52	Tucson	Arizona
Erica	Williams	32327	60	Show Low	Arizona
Leroy	Brown	32380	22	Pinetop	Arizona
Elroy	Cleaver	32382	22	Globe	Arizona

select first, last, city
   from empinfo
   where first LIKE 'Er%';

This SQL statement will match any first names that start with 'Er'. Strings must be in single quotes.

Or you can specify,

select first, last
   from empinfo
   where last LIKE '%s';

This statement will match any last names that end in a 's'.

select * from empinfo
   where first = 'Eric';

This will only select rows where the first name equals 'Eric' exactly.

What will be the result of the following sample SELECT statements?

select first, last, city from empinfo;

first	last	city
John	Jones	Payson
Mary	Jones	Payson
Eric	Edwards	San Diego
Mary Ann	Edwards	Phoenix
Ginger	Howell	Cottonwood
Sebastian	Smith	Gila Bend
Gus	Gray	Bagdad
Mary Ann	May	Tucson
Erica	Williams	Show Low
Leroy	Brown	Pinetop
Elroy	Cleaver	Globe

select last, city, age from empinfo 
where age > 30;

last	city	age
Jones	Payson	45
Edwards	San Diego	32
Edwards	Phoenix	32
Howell	Cottonwood	42
Gray	Bagdad	35
May	Tucson	52
Williams	Show Low	60

select first, last, city, state from empinfo
       where first LIKE 'J%';

first	last	city	state
John	Jones	Payson	Arizona

select * from empinfo;

first	last	id	age	city	state
John	Jones	99980	45	Payson	Arizona
Mary	Jones	99982	25	Payson	Arizona
Eric	Edwards	88232	32	San Diego	California
Mary Ann	Edwards	88233	32	Phoenix	Arizona
Ginger	Howell	98002	42	Cottonwood	Arizona
Sebastian	Smith	92001	23	Gila Bend	Arizona
Gus	Gray	22322	35	Bagdad	Arizona
Mary Ann	May	32326	52	Tucson	Arizona
Erica	Williams	32327	60	Show Low	Arizona
Leroy	Brown	32380	22	Pinetop	Arizona
Elroy	Cleaver	32382	22	Globe	Arizona

select first, last, from empinfo
       where last LIKE '%s';

first	last
John	Jones
Mary	Jones
Eric	Edwards
Mary Ann	Edwards
Erica	Williams

 select first, last, age from empinfo
       where last LIKE '%illia%';

first	last	age
Erica	Williams	60

select * from empinfo where first = 'Eric';

first	last	id	age	city	state
Eric	Edwards	88232	32	San Diego	California

Exercise 1.

Enter select statements to:

Display the first name and age for everyone that's in the table.
Display the first name, last name, and city for everyone that's not from Payson.
Display all columns for everyone that is over 40 years old.
Display the first and last names for everyone whose last name ends in an "ay".
Display all columns for everyone whose first name equals "Mary".
Display all columns for everyone whose first name contains "Mary".

Answers

Exercise 2.

items_ordered

customerid	order_date	item	quantity	price
10330	30-Jun-1999	Pogo stick	1	28.00
10101	30-Jun-1999	Raft	1	58.00
10298	01-Jul-1999	Skateboard	1	33.00
10101	01-Jul-1999	Life Vest	4	125.00
10299	06-Jul-1999	Parachute	1	1250.00
10339	27-Jul-1999	Umbrella	1	4.50
10449	13-Aug-1999	Unicycle	1	180.79
10439	14-Aug-1999	Ski Poles	2	25.50
10101	18-Aug-1999	Rain Coat	1	18.30
10449	01-Sep-1999	Snow Shoes	1	45.00
10439	18-Sep-1999	Tent	1	88.00
10298	19-Sep-1999	Lantern	2	29.00
10410	28-Oct-1999	Sleeping Bag	1	89.22
10438	01-Nov-1999	Umbrella	1	6.75
10438	02-Nov-1999	Pillow	1	8.50
10298	01-Dec-1999	Helmet	1	22.00
10449	15-Dec-1999	Bicycle	1	380.50
10449	22-Dec-1999	Canoe	1	280.00
10101	30-Dec-1999	Hoola Hoop	3	14.75
10330	01-Jan-2000	Flashlight	4	28.00
10101	02-Jan-2000	Lantern	1	16.00
10299	18-Jan-2000	Inflatable Mattress	1	38.00
10438	18-Jan-2000	Tent	1	79.99
10413	19-Jan-2000	Lawnchair	4	32.00
10410	30-Jan-2000	Unicycle	1	192.50
10315	2-Feb-2000	Compass	1	8.00
10449	29-Feb-2000	Flashlight	1	4.50
10101	08-Mar-2000	Sleeping Bag	2	88.70
10298	18-Mar-2000	Pocket Knife	1	22.38
10449	19-Mar-2000	Canoe paddle	2	40.00
10298	01-Apr-2000	Ear Muffs	1	12.50
10330	19-Apr-2000	Shovel	1	16.75

Enter select statements to:

From the items_ordered table, select a list of all items purchased for customerid 10449. Display the customerid, item, and price for this customer.
Select all columns from the items_ordered table for whoever purchased a Tent.
Select the customerid, order_date, and item values from the items_ordered table for any items in the item column that start with the letter "S".
Select the distinct items in the items_ordered table. In other words, display a listing of each of the unique items from the items_ordered table.
Make up your own select statements and submit them.

Answers

Aggregate Functions

MIN	returns the smallest value in a given column
MAX	returns the largest value in a given column
SUM	returns the sum of the numeric values in a given column
AVG	returns the average value of a given column
COUNT	returns the total number of values in a given column
COUNT(*)	returns the number of rows in a table

Aggregate functions are used to compute against a "returned column of numeric data" from your SELECT statement. They basically summarize the results of a particular column of selected data. We are covering these here since they are required by the next topic, "GROUP BY". Although they are required for the "GROUP BY" clause, these functions can be used without the "GROUP BY" clause. For example:

SELECT AVG(salary)
FROM employee;

This statement will return a single result which contains the average value of everything returned in the salary column from the employee table.

Another example:

SELECT AVG(salary)
    FROM employee;
       WHERE title = 'Programmer';

This statement will return the average salary for all employees whose title is equal to 'Programmer'

Example:

value	price_apiece
1	10
100	9
500	8
1000	7

Example:

SELECT Count(*)
FROM employees;

This particular statement is slightly different from the other aggregate functions since there isn't a column supplied to the count function. This statement will return the number of rows in the employees table.

Exercise 3.

Select the maximum price of any item ordered in the items_ordered table ( see above ). Hint: Select the maximum price only.>
Select the average price of all of the items ordered that were purchased in the month of Dec.
What are the total number of rows in the items_ordered table?
For all of the tents that were ordered in the items_ordered table, what is the price of the lowest tent? Hint: Your query should return the price only.

Answers

GROUP BY clause

The GROUP BY clause will gather all of the rows together that contain data in the specified column(s) and will allow aggregate functions to be performed on the one or more columns. This can best be explained by an example:

GROUP BY clause syntax:

SELECT column1, 
SUM(column2)
FROM "list-of-tables"
GROUP BY "column-list";

Let's say you would like to retrieve a list of the highest paid salaries in each dept:

SELECT max(salary), dept
FROM employee 
GROUP BY dept;

This statement will select the maximum salary for the people in each unique department. Basically, the salary for the person who makes the most in each department will be displayed. Their, salary and their department will be returned.

What if I wanted to display their lastname too?

For example, take a look at the items_ordered table. Let's say you want to group everything of quantity 1 together, everything of quantity 2 together, everything of quantity 3 together, etc. If you would like to determine what the largest cost item is for each grouped quantity (all quantity 1's, all quantity 2's, all quantity 3's, etc.), you would enter:

SELECT quantity, max(price)
FROM items_ordered
GROUP BY quantity;

Enter the statement in above, and take a look at the results to see if it returned what you were expecting. Verify that the maximum price in each Quantity Group is really the maximum price.

Exercise 4.

customerid	firstname	lastname	city	state
10101	John	Gray	Lynden	Washington
10298	Leroy	Brown	Pinetop	Arizona
10299	Elroy	Keller	Snoqualmie	Washington
10315	Lisa	Jones	Oshkosh	Wisconsin
10325	Ginger	Schultz	Pocatello	Idaho
10329	Kelly	Mendoza	Kailua	Hawaii
10330	Shawn	Dalton	Cannon Beach	Oregon
10338	Michael	Howell	Tillamook	Oregon
10339	Anthony	Sanchez	Winslow	Arizona
10408	Elroy	Cleaver	Globe	Arizona
10410	Mary Ann	Howell	Charleston	South Carolina
10413	Donald	Davids	Gila Bend	Arizona
10419	Linda	Sakahara	Nogales	Arizona
10429	Sarah	Graham	Greensboro	North Carolina
10438	Kevin	Smith	Durango	Colorado
10439	Conrad	Giles	Telluride	Colorado
10449	Isabela	Moore	Yuma	Arizona

How many people are in each unique state in the customers table? Select the state and display the number of people in each. Hint: count is used to count rows in a column, sum works on numeric data only.
From the items_ordered table, select the item, maximum price, and minimum price for each specific item in the table. Hint: The items will need to be broken up into separate groups.
How many orders did each customer make? Use the items_ordered table. Select the customerid, number of orders they made, and the sum of their orders. Click the Group By answers link below if you have any problems.

Answers

HAVING clause

The HAVING clause allows you to specify conditions on the rows for each group - in other words, which rows should be selected will be based on the conditions you specify. The HAVING clause should follow the GROUP BY clause if you are going to use it.

HAVING clause syntax:

SELECT column1, 
SUM(column2)
FROM "list-of-tables"
GROUP BY "column-list"
HAVING "condition";

HAVING can best be described by example. Let's say you have an employee table containing the employee's name, department, salary, and age. If you would like to select the average salary for each employee in each department, you could enter:

SELECT dept, avg(salary)
FROM employee
GROUP BY dept;

But, let's say that you want to ONLY calculate & display the average if their salary is over 20000:

SELECT dept, avg(salary)
FROM employee
GROUP BY dept
HAVING avg(salary) > 20000;

Exercise 5.

How many people are in each unique state in the customers table that have more than one person in the state? Select the state and display the number of how many people are in each if it's greater than 1.
From the items_ordered table, select the item, maximum price, and minimum price for each specific item in the table. Only display the results if the maximum price for one of the items is greater than 190.00.
How many orders did each customer make? Use the items_ordered table. Select the customerid, number of orders they made, and the sum of their orders if they purchased more than 1 item.

Answers

ORDER BY clause

ORDER BY is an optional clause which will allow you to display the results of your query in a sorted order (either ascending order or descending order) based on the columns that you specify to order by.

ORDER BY clause syntax:

SELECT column1, SUM(column2)
FROM "list-of-tables"
ORDER BY "column-list" [ASC | DESC];

[ ] = optional

This statement will select the employee_id, dept, name, age, and salary from the employee_info table where the dept equals 'Sales' and will list the results in Ascending (default) order based on their Salary.

ASC = Ascending Order - default
DESC = Descending Order

For example:

SELECT employee_id, dept, name, age, salary
FROM employee_info
WHERE dept = 'Sales'ORDER BY salary;

If you would like to order based on multiple columns, you must seperate the columns with commas. For example:

SELECT employee_id, dept, name, age, salary
FROM employee_info
WHERE dept = 'Sales'
ORDER BY salary, age DESC;

Exercise 6.

Select the lastname, firstname, and city for all customers in the customers table. Display the results in Ascending Order based on the lastname.
Same thing as exercise #1, but display the results in Descending order.
Select the item and price for all of the items in the items_ordered table that the price is greater than 10.00. Display the results in Ascending order based on the price.

Answers

Combining conditions and Boolean Operators

The AND operator can be used to join two or more conditions in the WHERE clause. Both sides of the AND condition must be true in order for the condition to be met and for those rows to be displayed.

SELECT column1, 
SUM(column2)
FROM "list-of-tables"
WHERE "condition1" AND 
"condition2";

The OR operator can be used to join two or more conditions in the WHERE clause also. However, either side of the OR operator can be true and the condition will be met - hence, the rows will be displayed. With the OR operator, either side can be true or both sides can be true.

For example:

SELECT employeeid, firstname, lastname, title, salary
FROM employee_info
WHERE salary >= 50000.00 AND title = 'Programmer';

This statement will select the employeeid, firstname, lastname, title, and salary from the employee_info table where the salary is greater than or equal to 50000.00 AND the title is equal to 'Programmer'. Both of these conditions must be true in order for the rows to be returned in the query. If either is false, then it will not be displayed.

Although they are not required, you can use paranthesis around your conditional expressions to make it easier to read:

SELECT employeeid, firstname, lastname, title, salary
FROM employee_info
WHERE (salary >= 50000.00) AND (title = 'Programmer');

Another Example:

SELECT firstname, lastname, title, salary
FROM employee_info
WHERE (title = 'Sales') OR (title = 'Programmer');

This statement will select the firstname, lastname, title, and salary from the employee_info table where the title is either equal to 'Sales' OR the title is equal to 'Programmer'.

Exercise 7.

Select the customerid, order_date, and item from the items_ordered table for all items unless they are 'Snow Shoes' or if they are 'Ear Muffs'. Display the rows as long as they are not either of these two items.
Select the item and price of all items that start with the letters 'S', 'P', or 'F'.

Answers

IN and BETWEEN Conditional Operators

SELECT col1, SUM(col2)
FROM "list-of-tables"
WHERE col3 IN 
       (list-of-values);
SELECT col1, SUM(col2)
FROM "list-of-tables"
WHERE col3 BETWEEN value1 
AND value2;

The IN conditional operator is really a set membership test operator. That is, it is used to test whether or not a value (stated before the keyword IN) is "in" the list of values provided after the keyword IN.

For example:

SELECT employeeid, lastname, salary
FROM employee_info
WHERE lastname IN ('Hernandez', 'Jones', 'Roberts', 'Ruiz');

This statement will select the employeeid, lastname, salary from the employee_info table where the lastname is equal to either: Hernandez, Jones, Roberts, or Ruiz. It will return the rows if it is ANY of these values.

The IN conditional operator can be rewritten by using compound conditions using the equals operator and combining it with OR - with exact same output results:

SELECT employeeid, lastname, salary
FROM employee_info
WHERE lastname = 'Hernandez' OR lastname = 'Jones' 
OR lastname = 'Roberts'OR lastname = 'Ruiz';

As you can see, the IN operator is much shorter and easier to read when you are testing for more than two or three values.
You can also use NOT IN to exclude the rows in your list.

The BETWEEN conditional operator is used to test to see whether or not a value (stated before the keyword BETWEEN) is "between" the two values stated after the keyword BETWEEN.

For example:

SELECT employeeid, age, lastname, salary
FROM employee_info
WHERE age BETWEEN 30 AND 40;

This statement will select the employeeid, age, lastname, and salary from the employee_info table where the age is between 30 and 40 (including 30 and 40).

This statement can also be rewritten without the BETWEEN operator:

SELECT employeeid, age, lastname, salary
FROM employee_info
WHERE age >= 30 AND age <= 40;

You can also use NOT BETWEEN to exclude the values between your range.

Exercise 8.

Select the date, item, and price from the items_ordered table for all of the rows that have a price value ranging from 10.00 to 80.00.
Select the firstname, city, and state from the customers table for all of the rows where the state value is either: Arizona, Washington, Oklahoma, Colorado, or Hawaii.

Answers

Mathematical Operators

Standard ANSI SQL-92 supports the following first four basic arithmetic operators:

+	addition
-	subtraction
*	multiplication
/	division
%	modulo

The modulo operator determines the integer remainder of the division. This operator is not ANSI SQL supported, however, most databases support it. The following are some more useful mathematical functions to be aware of since you might need them. These functions are not standard in the ANSI SQL-92 specs, therefore they may or may not be available on the specific RDBMS that you are using. However, they were available on several major database systems that I tested. They WILL work on this tutorial.

ABS(x)	returns the absolute value of x
SIGN(x)	returns the sign of input x as -1, 0, or 1 (negative, zero, or positive respectively)
MOD(x,y)	modulo - returns the integer remainder of x divided by y (same as x%y)
FLOOR(x)	returns the largest integer value that is less than or equal to x
CEILING(x) or CEIL(x)	returns the smallest integer value that is greater than or equal to x
POWER(x,y)	returns the value of x raised to the power of y
ROUND(x)	returns the value of x rounded to the nearest whole integer
ROUND(x,d)	returns the value of x rounded to the number of decimal places specified by the value d
SQRT(x)	returns the square-root value of x

For example:

SELECT round(salary), firstname
FROM employee_info

This statement will select the salary rounded to the nearest whole value and the firstname from the employee_info table.

Exercise 9.

Select the item and per unit price for each item in the items_ordered table. Hint: Divide the price by the quantity.

Answer

Table Joins, a must

All of the queries up until this point have been useful with the exception of one major limitation - that is, you've been selecting from only one table at a time with your SELECT statement. It is time to introduce you to one of the most beneficial features of SQL & relational database systems - the "Join". To put it simply, the "Join" makes relational database systems "relational".

Joins allow you to link data from two or more tables together into a single query result--from one single SELECT statement.

A "Join" can be recognized in a SQL SELECT statement if it has more than one table after the FROM keyword.

For example:

SELECT "list-of-columns"
FROM table1,table2
WHERE "search-condition(s)"

Joins can be explained easier by demonstrating what would happen if you worked with one table only, and didn't have the ability to use "joins". This single table database is also sometimes referred to as a "flat table". Let's say you have a one-table database that is used to keep track of all of your customers and what they purchase from your store:

id	first	last	address	city	state	zip	date	item	price

Everytime a new row is inserted into the table, all columns will be be updated, thus resulting in unnecessary "redundant data". For example, every time Wolfgang Schultz purchases something, the following rows will be inserted into the table:

id	first	last	address	city	state	zip	date	item	price
10982	Wolfgang	Schultz	300 N. 1st Ave	Yuma	AZ	85002	032299	snowboard	45.00
10982	Wolfgang	Schultz	300 N. 1st Ave	Yuma	AZ	85002	082899	snow shovel	35.00
10982	Wolfgang	Schultz	300 N. 1st Ave	Yuma	AZ	85002	091199	gloves	15.00
10982	Wolfgang	Schultz	300 N. 1st Ave	Yuma	AZ	85002	100999	lantern	35.00
10982	Wolfgang	Schultz	300 N. 1st Ave	Yuma	AZ	85002	022900	tent	85.00

An ideal database would have two tables:

One for keeping track of your customers
And the other to keep track of what they purchase:

"Customer_info" table:

customer_number	firstname	lastname	address	city	state	zip

"Purchases" table:

customer_number	date	item	price

Now, whenever a purchase is made from a repeating customer, the 2nd table, "Purchases" only needs to be updated! We've just eliminated useless redundant data, that is, we've just normalized this database!

Notice how each of the tables have a common "cusomer_number" column. This column, which contains the unique customer number will be used to JOIN the two tables. Using the two new tables, let's say you would like to select the customer's name, and items they've purchased. Here is an example of a join statement to accomplish this:

SELECT customer_info.firstname, customer_info.lastname, purchases.item
FROM customer_info, purchases
WHERE customer_info.customer_number = purchases.customer_number;

This particular "Join" is known as an "Inner Join" or "Equijoin". This is the most common type of "Join" that you will see or use.

Notice that each of the colums are always preceeded with the table name and a period. This isn't always required, however, it IS good practice so that you wont confuse which colums go with what tables. It is required if the name column names are the same between the two tables. I recommend preceeding all of your columns with the table names when using joins.

Note: The syntax described above will work with most Database Systems -including the one with this tutorial. However, in the event that this doesn't work with yours, please check your specific database documentation.

Although the above will probably work, here is the ANSI SQL-92 syntax specification for an Inner Join using the preceding statement above that you might want to try:

SELECT customer_info.firstname, customer_info.lastname, purchases.item
FROM customer_info INNER JOIN purchases
ON customer_info.customer_number = purchases.customer_number;

SELECT customer_info.firstname, customer_info.lastname, purchases.item
FROM customer_info INNER JOIN purchases
ON customer_info.customer_number = purchases.customer_number;

Another example:

SELECT employee_info.employeeid, employee_info.lastname, employee_sales.comission
FROM employee_info, employee_sales
WHERE employee_info.employeeid = employee_sales.employeeid;

This statement will select the employeeid, lastname (from the employee_info table), and the comission value (from the employee_sales table) for all of the rows where the employeeid in the employee_info table matches the employeeid in the employee_sales table.

Exercises 10.

Write a query using a join to determine which items were ordered by each of the customers in the customers table. Select the customerid, firstname, lastname, order_date, item, and price for everything each customer purchased in the items_ordered table.
Repeat exercise #1, however display the results sorted by state in descending order.

Answers

Creating Tables

The create table statement is used to create a new table. Here is the format of a simple create table statement:

create table "tablename"
("column1" "data type",
 "column2" "data type",
 "column3" "data type");

Format of create table if you were to use optional constraints:

create table "tablename"
("column1" "data type" 
         [constraint],
 "column2" "data type" 
         [constraint],
 "column3" "data type" 
        [constraint]);
 [ ] = optional

Note: You may have as many columns as you'd like, and the constraints are optional.

Example:

create table employee
(first varchar(15),
 last varchar(20),
 age number(3),
 address varchar(30),
 city varchar(20),
 state varchar(20));

To create a new table, enter the keywords create table followed by the table name, followed by an open parenthesis, followed by the first column name, followed by the data type for that column, followed by any optional constraints, and followed by a closing parenthesis. It is important to make sure you use an open parenthesis before the beginning table, and a closing parenthesis after the end of the last column definition. Make sure you seperate each column definition with a comma. All SQL statements should end with a ";".

The table and column names must start with a letter and can be followed by letters, numbers, or underscores - not to exceed a total of 30 characters in length. Do not use any SQL reserved keywords as names for tables or column names (such as "select", "create", "insert", etc).

Data types specify what the type of data can be for that particular column. If a column called "Last_Name", is to be used to hold names, then that particular column should have a "varchar" (variable-length character) data type.

Here are the most common Data types:

`char(size)`	Fixed-length character string. Size is specified in parenthesis. Max 255 bytes.
`varchar(size)`	Variable-length character string. Max size is specified in parenthesis.
`number(size)`	Number value with a max number of column digits specified in parenthesis.
`date`	Date value
`number(size,d)`	Number value with a maximum number of digits of "size" total, with a maximum number of "d" digits to the right of the decimal.

What are constraints? When tables are created, it is common for one or more columns to have constraints associated with them. A constraint is basically a rule associated with a column that the data entered into that column must follow. For example, a "unique" constraint specifies that no two records can have the same value in a particular column. They must all be unique. The other two most popular constraints are "not null" which specifies that a column can't be left blank, and "primary key". A "primary key" constraint defines a unique identification of each record (or row) in a table. All of these and more will be covered in the future Advanced release of this Tutorial. Constraints can be entered in this SQL interpreter, however, they are not supported in this Intro to SQL tutorial & interpreter. They will be covered and supported in the future release of the Advanced SQL tutorial - that is, if "response" is good.

It's now time for you to design and create your own table. You will use this table throughout the rest of the tutorial. If you decide to change or redesign the table, you can either drop it and recreate it or you can create a completely different one. The SQL statement drop will be covered later.

Exercise 11.

You have just started a new company. It is time to hire some employees. You will need to create a table that will contain the following information about your new employees: firstname, lastname, title, age, and salary.

Answer

Inserting into a Table

The insert statement is used to insert or add a row of data into the table.

To insert records into a table, enter the key words insert into followed by the table name, followed by an open parenthesis, followed by a list of column names separated by commas, followed by a closing parenthesis, followed by the keyword values, followed by the list of values enclosed in parenthesis. The values that you enter will be held in the rows and they will match up with the column names that you specify. Strings should be enclosed in single quotes, and numbers should not.

insert into "tablename"
 (first_column,...last_column)
  values (first_value,...last_value);

In the example below, the column name first will match up with the value 'Luke', and the column name state will match up with the value 'Georgia'.

Example:

insert into employee
  (first, last, age, address, city, state)
  values ('Luke', 'Duke', 45, '2130 Boars Nest', 
          'Hazard Co', 'Georgia');

Note: All strings should be enclosed between single quotes: 'string'

Exercise 12.

It is time to insert data into your new employee table.

Your first three employees are the following:

Jonie Weber, Secretary, 28, 19500.00
Potsy Weber, Programmer, 32, 45300.00
Dirk Smith, Programmer II, 45, 75020.00

Enter these employees into your table first, and then insert at least 5 more of your own list of employees in the table.

After they're inserted into the table, enter select statements to:

Select all columns for everyone in your employee table.
Select all columns for everyone with a salary over 30000.
Select first and last names for everyone that's under 30 years old.
Select first name, last name, and salary for anyone with "Programmer" in their title.
Select all columns for everyone whose last name contains "ebe".
Select the first name for everyone whose first name equals "Potsy".
Select all columns for everyone over 80 years old.
Select all columns for everyone whose last name ends in "ith".

Answers

Updating Records

The update statement is used to update or change records that match a specified criteria. This is accomplished by carefully constructing a where clause.

update "tablename"
set "columnname" = 
    "newvalue"
 [,"nextcolumn" = 
   "newvalue2"...]
where "columnname" 
  OPERATOR "value" 
 [and|or "column" 
  OPERATOR "value"];

 [] = optional

[The above example was line wrapped for better viewing on this Web page.]

Examples:

update phone_book
  set area_code = 623
  where prefix = 979;

update phone_book
  set last_name = 'Smith', prefix=555, suffix=9292
  where last_name = 'Jones';

update employee
  set age = age+1
  where first_name='Mary' and last_name='Williams';

Exercise 13.

After each update, issue a select statement to verify your changes.

Jonie Weber just got married to Bob Williams. She has requested that her last name be updated to Weber-Williams.
Dirk Smith's birthday is today, add 1 to his age.
All secretaries are now called "Administrative Assistant". Update all titles accordingly.
Everyone that's making under 30000 are to receive a 3500 a year raise.
Everyone that's making over 33500 are to receive a 4500 a year raise.
All "Programmer II" titles are now promoted to "Programmer III".
All "Programmer" titles are now promoted to "Programmer II".

Answers

Deleting Records

The delete statement is used to delete records or rows from the table.

delete from "tablename"

where "columnname" 
  OPERATOR "value" 
[and|or "column" 
  OPERATOR "value"];

[ ] = optional

[The above example was line wrapped for better viewing on this Web page.]

Examples:

delete from employee;

Note: if you leave off the where clause, all records will be deleted!

delete from employee
  where lastname = 'May';

delete from employee
  where firstname = 'Mike' or firstname = 'Eric';

To delete an entire record/row from a table, enter "delete from" followed by the table name, followed by the where clause which contains the conditions to delete. If you leave off the where clause, all records will be deleted.

Exercise 14.

(Use the select statement to verify your deletes):

Jonie Weber-Williams just quit, remove her record from the table.
It's time for budget cuts. Remove all employees who are making over 70000 dollars.

Answers

Drop a Table

The drop table command is used to delete a table and all rows in the table.

To delete an entire table including all of its rows, issue the drop table command followed by the tablename. drop table is different from deleting all of the records in the table. Deleting all of the records in the table leaves the table including column and constraint information. Dropping the table removes the table definition as well as all of its rows.

drop table "tablename"

Example:

drop table myemployees_ts0211;

Exercise 15.

Drop your employee table.

---

based on http://sqlcourse2.com/

Categories: beginner (Audience)

Simple queries

Created by Anett Szabo, last modified by Anett Szabo 26 Jul 2007, at 12:11 PM

If you start up SQL*Plus, you can start browsing around immediately with the SELECT statement. You don't even need to define a table; Oracle provides the built-in dual table for times when you're interested in a constant or a function:


SQL> select 'Hello World' from dual;

'HELLOWORLD
-----------
Hello World

SQL> select 2+2 from dual;

       2+2
----------
	 4

SQL> select sysdate from dual;

SYSDATE
----------
1999-02-14

... or to test your knowledge of three-valued logic (see the "Data Modeling" chapter):


SQL> select 4+NULL from dual;

    4+NULL
----------

(any expression involving NULL evaluates to NULL).

There is nothing magic about the dual table for these purposes; you can compute functions using the bboard table instead of dual:


select sysdate,2+2,atan2(0, -1) from bboard;

SYSDATE 	  2+2 ATAN2(0,-1)
---------- ---------- -----------
1999-01-14	    4  3.14159265
1999-01-14	    4  3.14159265
1999-01-14	    4  3.14159265
1999-01-14	    4  3.14159265
...

1999-01-14	    4  3.14159265
1999-01-14	    4  3.14159265
1999-01-14	    4  3.14159265

55010 rows selected.

but not everyone wants 55010 copies of the same result. The dual table is predefined during Oracle installation and, though it is just a plain old table, it is guaranteed to contain only one row because no user will have sufficient privileges to insert or delete rows from dual.

Getting beyond Hello World

To get beyond Hello World, pick a table of interest. As we saw in the introduction,

select * from users;

would retrieve all the information from every row of the users table. That's good for toy systems but in any production system, you'd be better off starting with


SQL> select count(*) from users;

  COUNT(*)
----------
      7352

You don't really want to look at 7352 rows of data, but you would like to see what's in the users table, start off by asking SQL*Plus to query Oracle's data dictionary and figure out what columns are available in the users table:


SQL> describe users
 Name				 Null?	  Type
 ------------------------------- -------- ----
 USER_ID			 NOT NULL NUMBER(38)
 FIRST_NAMES			 NOT NULL VARCHAR2(100)
 LAST_NAME			 NOT NULL VARCHAR2(100)
 PRIV_NAME				  NUMBER(38)
 EMAIL				 NOT NULL VARCHAR2(100)
 PRIV_EMAIL				  NUMBER(38)
 EMAIL_BOUNCING_P			  CHAR(1)
 PASSWORD			 NOT NULL VARCHAR2(30)
 URL					  VARCHAR2(200)
 ON_VACATION_UNTIL			  DATE
 LAST_VISIT				  DATE
 SECOND_TO_LAST_VISIT			  DATE
 REGISTRATION_DATE			  DATE
 REGISTRATION_IP			  VARCHAR2(50)
 ADMINISTRATOR_P			  CHAR(1)
 DELETED_P				  CHAR(1)
 BANNED_P				  CHAR(1)
 BANNING_USER				  NUMBER(38)
 BANNING_NOTE				  VARCHAR2(4000)

The data dictionary is simply a set of built-in tables that Oracle uses to store information about the objects (tables, triggers, etc.) that have been defined. Thus SQL*Plus isn't performing any black magic when you type describe; it is simply querying user_tab_columns, a view of some of the tables in Oracle's data dictionary. You could do the same explicitly, but it is a little cumbersome.


column fancy_type format a20
select column_name, data_type || '(' || data_length || ')' as fancy_type
from user_tab_columns
where table_name = 'USERS'
order by column_id;

Here we've had to make sure to put the table name ("USERS") in all-uppercase. Oracle is case-insensitive for table and column names in queries but the data dictionary records names in uppercase. Now that we know the names of the columns in the table, it will be easy to explore.

Simple Queries from One Table

A simple query from one table has the following structure:

the select list (which columns in our report)
the name of the table
the where clauses (which rows we want to see)
the order by clauses (how we want the rows arranged)

Let's see some examples. First, let's see how many users from MIT are registered on our site:


SQL> select email
from users
where email like '%mit.edu';

EMAIL
------------------------------
philg@mit.edu
andy@california.mit.edu
ben@mit.edu
...
wollman@lcs.mit.edu
ghomsy@mit.edu
hal@mit.edu
...
jpearce@mit.edu
richmond@alum.mit.edu
andy_roo@mit.edu
kov@mit.edu
fletch@mit.edu
lsandon@mit.edu
psz@mit.edu
philg@ai.mit.edu
philg@martigny.ai.mit.edu
andy@californnia.mit.edu
ty@mit.edu
teadams@mit.edu

68 rows selected.

The email like '%mit.edu' says "every row where the email column ends in 'mit.edu'". The percent sign is Oracle's wildcard character for "zero or more characters". Underscore is the wildcard for "exactly one character":


SQL> select email 
from users
where email like '___@mit.edu';

EMAIL
------------------------------
kov@mit.edu
hal@mit.edu
...
ben@mit.edu
psz@mit.edu

Suppose that we notice in the above report some similar email addresses. It is perhaps time to try out the ORDER BY clause:


SQL> select email
from users
where email like '%mit.edu'
order by email;

EMAIL
------------------------------
andy@california.mit.edu
andy@californnia.mit.edu
andy_roo@mit.edu
...
ben@mit.edu
...
hal@mit.edu
...
philg@ai.mit.edu
philg@martigny.ai.mit.edu
philg@mit.edu

Now we can see that this users table was generated by grinding over pre-ArsDigita Community Systems postings starting from 1995. In those bad old days, users typed their email address and name with each posting. Due to typos and people intentionally choosing to use different addresses at various times, we can see that we'll have to build some sort of application to help human beings merge some of the rows in the users table (e.g., all three occurrences of "philg" are in fact the same person (me)).

Restricting results

Suppose that you were featured on Yahoo in September 1998 and want to see how many users signed up during that month:


SQL> select count(*) 
from users
where registration_date >= '1998-09-01' 
and registration_date < '1998-10-01';

  COUNT(*)
----------
       920

We've combined two restrictions in the WHERE clause with an AND. We can add another restriction with another AND:


SQL> select count(*) 
from users
where registration_date >= '1998-09-01' 
and registration_date < '1998-10-01'
and email like '%mit.edu';

  COUNT(*)
----------
        35

OR and NOT are also available within the WHERE clause. For example, the following query will tell us how many classified ads we have that either have no expiration date or whose expiration date is later than the current date/time.


select count(*) 
from classified_ads
where expires >= sysdate
or expires is null;

Subqueries

You can query one table, restricting the rows returned based on information from another table. For example, to find users who have posted at least one classified ad:


select user_id, email 
from users 
where 0 < (select count(*) 
           from classified_ads
           where classified_ads.user_id = users.user_id);

   USER_ID EMAIL
---------- -----------------------------------
     42485 twm@meteor.com
     42489 trunghau@ecst.csuchico.edu
     42389 ricardo.carvajal@kbs.msu.edu
     42393 gon2foto@gte.net
     42399 rob@hawaii.rr.com
     42453 stefan9@ix.netcom.com
     42346 silverman@pon.net
     42153 gallen@wesleyan.edu
...

Conceptually, for each row in the users table Oracle is running the subquery against classified_ads to see how many ads are associated with that particular user ID. Keep in mind that this is only conceptually; the Oracle SQL parser may elect to execute this query in a more efficient manner.

Another way to describe the same result set is using EXISTS:


select user_id, email 
from users 
where exists (select 1
              from classified_ads
              where classified_ads.user_id = users.user_id);

This may be more efficient for Oracle to execute since it hasn't been instructed to actually count the number of classified ads for each user, but only to check and see if any are present. Think of EXISTS as a Boolean function that

takes a SQL query as its only parameter
returns TRUE if the query returns any rows at all, regardless of the contents of those rows (this is why we can use the constant 1 as the select list for the subquery)

JOIN

A professional SQL programmer would be unlikely to query for users who'd posted classified ads in the preceding manner. The SQL programmer knows that, inevitably, the publisher will want information from the classified ad table along with the information from the users table. For example, we might want to see the users and, for each user, the sequence of ad postings:


select users.user_id, users.email, classified_ads.posted
from users, classified_ads
where users.user_id = classified_ads.user_id
order by users.email, posted;

   USER_ID EMAIL			       POSTED
---------- ----------------------------------- ----------
     39406 102140.1200@compuserve.com	       1998-09-30
     39406 102140.1200@compuserve.com	       1998-10-08
     39406 102140.1200@compuserve.com	       1998-10-08
     39842 102144.2651@compuserve.com	       1998-07-02
     39842 102144.2651@compuserve.com	       1998-07-06
     39842 102144.2651@compuserve.com	       1998-12-13
...
     41284 yme@inetport.com		       1998-01-25
     41284 yme@inetport.com		       1998-02-18
     41284 yme@inetport.com		       1998-03-08
     35389 zhupanov@usa.net		       1998-12-10
     35389 zhupanov@usa.net		       1998-12-10
     35389 zhupanov@usa.net		       1998-12-10

Because of the JOIN restriction,

where users.user_id =
classified_ads.user_id

, we only see those users who have posted at least one classified ad, i.e., for whom a matching row may be found in the classified_ads table. This has the same effect as the subquery above.

The order by users.email, posted is key to making sure that the rows are lumped together by user and then printed in order of ascending posting time.

OUTER JOIN

Suppose that we want an alphabetical list of all of our users, with classified ad posting dates for those users who have posted classifieds. We can't do a simple JOIN because that will exclude users who haven't posted any ads. What we need is an OUTER JOIN, where Oracle will "stick in NULLs" if it can't find a corresponding row in the classified_ads table.


select users.user_id, users.email, classified_ads.posted
from users, classified_ads
where users.user_id = classified_ads.user_id(+)
order by users.email, posted;

...
   USER_ID EMAIL			       POSTED
---------- ----------------------------------- ----------
     52790 dbrager@mindspring.com
     37461 dbraun@scdt.intel.com
     52791 dbrenner@flash.net
     47177 dbronz@free.polbox.pl
     37296 dbrouse@enter.net
     47178 dbrown@cyberhighway.net
     36985 dbrown@uniden.com		       1998-03-05
     36985 dbrown@uniden.com		       1998-03-10
     34283 dbs117@amaze.net
     52792 dbsikorski@yahoo.com
...

The plus sign after classified_ads.user_id is our instruction to Oracle to "add NULL rows if you can't meet this JOIN constraint".

Extending a simple query into a JOIN

Suppose that you have a query from one table returning almost everything that you need, except for one column that's in another table. Here's a way to develop the JOIN without risking breaking your application:

add the new table to your FROM clause
add a WHERE constraint to prevent Oracle from building a Cartesian product
hunt for ambiguous column names in the SELECT list and other portions of the query; prefix these with table names if necessary
test that you've not broken anything in your zeal to add additional info
add a new column to the SELECT list

Here's an example from Problem Set 2 of a course that we give at MIT (see http://www.photo.net/teaching/psets/ps2/ps2.adp). Students build a conference room reservation system. They generally define two tables: rooms and reservations. The top level page is supposed to show a user what reservations he or she is current holding:


select room_id, start_time, end_time
from reservations
where user_id = 37

This produces an unacceptable page because the rooms are referred to by an ID number rather than by name. The name information is in the rooms table, so we'll have to turn this into a JOIN.

Step 1: add the new table to the FROM clause


select room_id, start_time, end_time
from reservations, rooms
where user_id = 37

We're in a world of hurt because Oracle is now going to join every row in rooms with every row in reservations where the user_id matches that of the logged-in user.

Step 2: add a constraint to the WHERE clause


select room_id, start_time, end_time
from reservations, rooms
where user_id = 37
and reservations.room_id = rooms.room_id

Step 3: look for ambiguously defined columns

Both reservations and rooms contain columns called "room_id". So we need to prefix the room_id column in the SELECT list with "reservations.". Note that we don't have to prefix start_time and end_time because these columns are only present in reservations.


select reservations.room_id, start_time, end_time
from reservations, rooms
where user_id = 37
and reservations.room_id = rooms.room_id

Step 4: test

Test the query to make sure that you haven't broken anything. You should get back the same rows with the same columns as before.

Step 5: add a new column to the SELECT list

We're finally ready to do what we set out to do: add room_name to the list of columns for which we're querying.


select reservations.room_id, start_time, end_time, rooms.room_name
from reservations, rooms
where user_id = 37
and reservations.room_id = rooms.room_id

Reference

Oracle8 Server SQL Reference, SELECT command section

---

based on SQL for Web Nerds

Categories: beginner (Audience)

Data modeling

Created by Anett Szabo, last modified by Malte Sussdorff 25 Jul 2007, at 05:57 PM

Data modeling is the hardest and most important activity in the RDBMS world. If you get the data model wrong, your application might not do what users need, it might be unreliable, it might fill up the database with garbage. Why then do we start a SQL tutorial with the most challenging part of the job? Because you can't do queries, inserts, and updates until you've defined some tables. And defining tables is data modeling.

When data modeling, you are telling the RDBMS the following:

what elements of the data you will store
how large each element can be
what kind of information each element can contain
what elements may be left blank
which elements are constrained to a fixed range
whether and how various tables are to be linked

Three-Valued Logic

Programmers in most computer languages are familiar with Boolean logic. A variable may be either true or false. Pervading SQL, however, is the alien idea of three-valued logic. A column can be true, false, or NULL. When building the data model you must affirmatively decide whether a NULL value will be permitted for a column and, if so, what it means.

For example, consider a table for recording user-submitted comments to a Web site. The publisher has made the following stipulations:

comments won't go live until approved by an editor
the admin pages will present editors with all comments that are pending approval, i.e., have been submitted but neither approved nor disapproved by an editor already

Here's the data model:


create table user_submitted_comments (
	comment_id		integer primary key,
	user_id			not null references users,
	submission_time		date default sysdate not null,
	ip_address		varchar(50) not null,
	content			clob,
	approved_p		char(1) check(approved_p in ('t','f'))
);

Implicit in this model is the assumption that approved_p can be NULL and that, if not explicitly set during the INSERT, that is what it will default to. What about the check constraint? It would seem to restrict approved_p to values of "t" or "f". NULL, however, is a special value and if we wanted to prevent approved_p from taking on NULL we'd have to add an explicit not null constraint.

How do NULLs work with queries? Let's fill user_submitted_comments with some sample data and see:


insert into user_submitted_comments
(comment_id, user_id, ip_address, content)
values
(1, 23069, '18.30.2.68', 'This article reminds me of Hemingway');

Table created.

SQL> select first_names, last_name, content, user_submitted_comments.approved_p
from user_submitted_comments, users 
where user_submitted_comments.user_id = users.user_id;

FIRST_NAMES  LAST_NAME	     CONTENT				  APPROVED_P
------------ --------------- ------------------------------------ ------------
Philip	     Greenspun	     This article reminds me of Hemingway

We've successfully JOINed the user_submitted_comments and users table to get both the comment content and the name of the user who submitted it. Notice that in the select list we had to explicitly request user_submitted_comments.approved_p. This is because the users table also has an approved_p column.

When we inserted the comment row we did not specify a value for the approved_p column. Thus we expect that the value would be NULL and in fact that's what it seems to be. Oracle's SQL*Plus application indicates a NULL value with white space.

For the administration page, we'll want to show only those comments where the approved_p column is NULL:


SQL> select first_names, last_name, content, user_submitted_comments.approved_p
from user_submitted_comments, users 
where user_submitted_comments.user_id = users.user_id
and user_submitted_comments.approved_p = NULL;

no rows selected

"No rows selected"? That's odd. We know for a fact that we have one row in the comments table and that is approved_p column is set to NULL. How to debug the query? The first thing to do is simplify by removing the JOIN:


SQL> select * from user_submitted_comments where approved_p = NULL;

no rows selected

What is happening here is that any expression involving NULL evaluates to NULL, including one that effectively looks like "NULL = NULL". The WHERE clause is looking for expressions that evaluate to true. What you need to use is the special test IS NULL:


SQL> select * from user_submitted_comments where approved_p is NULL;

COMMENT_ID    USER_ID SUBMISSION_T IP_ADDRESS
---------- ---------- ------------ ----------
CONTENT 			     APPROVED_P
------------------------------------ ------------
	 1	23069 2000-05-27   18.30.2.68
This article reminds me of Hemingway

An adage among SQL programmers is that the only time you can use "= NULL" is in an UPDATE statement (to set a column's value to NULL). It never makes sense to use "= NULL" in a WHERE clause.

The bottom line is that as a data modeler you will have to decide which columns can be NULL and what that value will mean.

Back to the Mailing List

Let's return to the mailing list data model from the introduction:


create table mailing_list (
        email           varchar(100) not null primary key,
        name            varchar(100)
);

create table phone_numbers (
        email           varchar(100) not null references mailing_list,
        number_type     varchar(15) check (number_type in ('work','home','cell','beeper')),
        phone_number    varchar(20) not null
);

This data model locks you into some realities:

You will not be sending out any physical New Year's cards to folks on your mailing list; you don't have any way to store their addresses.
You will not be sending out any electronic mail to folks who work at companies with elaborate Lotus Notes configurations; sometimes Lotus Notes results in email addresses that are longer than 100 characters.
You are running the risk of filling the database with garbage since you have not constrained phone numbers in any way. American users could add or delete digits by mistake. International users could mistype country codes.
You are running the risk of not being able to serve rich people because the number_type column may be too constrained. Suppose William H. Gates the Third wishes to record some extra phone numbers with types of "boat", "ranch", "island", and "private_jet". The check (number_type in ('work','home','cell','beeper')) statement prevents Mr. Gates from doing this.
You run the risk of having records in the database for people whose name you don't know, since the name column of mailing_list is free to be NULL.
Changing a user's email address won't be the simplest possible operation. You're using email as a key in two tables and therefore will have to update both tables. The references mailing_list keeps you from making the mistake of only updating mailing_list and leaving orphaned rows in phone_numbers. But if users changed their email addresses frequently, you might not want to do things this way.
Since you've no provision for storing a password or any other means of authentication, if you allow users to update their information, you run a minor risk of allowing a malicious change. (The risk isn't as great as it seems because you probably won't be publishing the complete mailing list; an attacker would have to guess the names of people on your mailing list.)

These aren't necessarily bad realities in which to be locked. However, a good data modeler recognizes that every line of code in the .sql file has profound implications for the Web service.

To get some more information on how a simple datamodel for a Discussion Forum can evolve, read en:sql-wn-data_modeling-philip

Representing Web Site Core Content

Free-for-all Internet discussions can often be useful and occasionally are compelling, but the anchor of a good Web site is usually a set of carefully authored extended documents. Historically these have tended to be stored in the Unix file system and they don't change too often. Hence I refer to them as static pages. Examples of static pages on the photo.net server include this book chapter, the tutorial on light for photographers at http://www.photo.net/making-photographs/light.

We have some big goals to consider. We want the data in the database to

help community experts figure out which articles need revision and which new articles would be most valued by the community at large.
help contributors work together on a draft article or a new version of an old article.
collect and organize reader comments and discussion, both for presentation to other readers but also to assist authors in keeping content up-to-date.
collect and organize reader-submitted suggestions of related content out on the wider Internet (i.e., links).
help point readers to new or new-to-them content that might interest them, based on what they've read before or based on what kind of content they've said is interesting.

The big goals lead to some more concrete objectives:

We will need a table that holds the static pages themselves.
Since there are potentially many comments per page, we need a separate table to hold the user-submitted comments.
Since there are potentially many related links per page, we need a separate table to hold the user-submitted links.
Since there are potentially many authors for one page, we need a separate table to register the author-page many-to-one relation.
Considering the "help point readers to stuff that will interest them" objective, it seems that we need to store the category or categories under which a page falls. Since there are potentially many categories for one page, we need a separate table to hold the mapping between pages and categories.


create table static_pages (
	page_id		integer not null primary key,
	url_stub	varchar(400) not null unique,
	original_author	integer references users(user_id),
	page_title	varchar(4000),
	page_body	clob,
	obsolete_p	char(1) default 'f' check (obsolete_p in ('t','f')),
	members_only_p	char(1) default 'f' check (members_only_p in ('t','f')),
	price		number,
	copyright_info	varchar(4000),
	accept_comments_p	char(1) default 't' check (accept_comments_p in ('t','f')),
	accept_links_p		char(1) default 't' check (accept_links_p in ('t','f')),
	last_updated		date,
	-- used to prevent minor changes from looking like new content
	publish_date		date
);

create table static_page_authors (
	page_id		integer not null references static_pages,
	user_id		integer not null references users,
	notify_p	char(1) default 't' check (notify_p in ('t','f')),
	unique(page_id,user_id)
);

Note that we use a generated integer page_id key for this table. We could key the table by the url_stub (filename), but that would make it very difficult to reorganize files in the Unix file system (something that should actually happen very seldom on a Web server; it breaks links from foreign sites).

How to generate these unique integer keys when you have to insert a new row into static_pages? You could

lock the table
find the maximum page_id so far
add one to create a new unique page_id
insert the row
commit the transaction (releases the table lock)

Much better is to use Oracle's built-in sequence generation facility:


create sequence page_id_sequence start with 1;

Then we can get new page IDs by using page_id_sequence.nextval in INSERT statements (see the Transactions chapter for a fuller discussion of sequences).

Reference

Here is a summary of the data modeling tools available to you in Oracle, each hyperlinked to the Oracle documentation. This reference section covers the following:

data types en:sql-wn-data_modeling_data_types
statements for creating, altering, and dropping tables en:sql-wn-data_modeling_tables
constraints en:sql-wn-data_modeling_contraints

---

based on SQL for Web Nerds

Categories: developer (Audience)

Triggers

Created by Anett Szabo, last modified by Malte Sussdorff 25 Jul 2007, at 05:56 PM

A trigger is a fragment of code that you tell Oracle to run before or after a table is modified. A trigger has the power to

make sure that a column is filled in with default information
make sure that an audit row is inserted into another table
after finding that the new information is inconsistent with other stuff in the database, raise an error that will cause the entire transaction to be rolled back

Remember the mailing lists example from the beginning?

Suppose that you've been using the above data model to collect the names of Web site readers who'd like to be alerted when you add new articles. You haven't sent any notices for two months. You want to send everyone who signed up in the last two months a "Welcome to my Web service; thanks for signing up; here's what's new" message. You want to send the older subscribers a simple "here's what's new" message. But you can't do this because you didn't store a registration date. It is easy enough to fix the table:


alter table mailing_list add (registration_date date);

But what if you have 15 different Web scripts that use this table? The ones that query it aren't a problem. If they don't ask for the new column, they won't get it and won't realize that the table has been changed (this is one of the big selling features of the RDBMS). But the scripts that update the table will all need to be changed. If you miss a script, you're potentially stuck with a table where various random rows are missing critical information.

Oracle has a solution to your problem: triggers. A trigger is a way of telling Oracle "any time anyone touches this table, I want you to execute the following little fragment of code". Here's how we define the trigger mailing_list_registration_date:


create trigger mailing_list_registration_date
before insert on mailing_list
for each row
when (new.registration_date is null)
begin
 :new.registration_date := sysdate;
end;

Note that the trigger only runs when someone is trying to insert a row with a NULL registration date. If for some reason you need to copy over records from another database and they have a registration date, you don't want this trigger overwriting it with the date of the copy.

A second point to note about this trigger is that it runs for each row. This is called a "row-level trigger" rather than a "statement-level trigger", which runs once per transaction, and is usually not what you want.

A third point is that we're using the magic Oracle procedure sysdate, which will return the current time. The Oracle date type is precise to the second even though the default is to display only the day.

A fourth point is that, starting with Oracle 8, we could have done this more cleanly by adding a default sysdate instruction to the column's definition.

The final point worth noting is the :new. syntax. This lets you refer to the new values being inserted. There is an analogous :old. feature, which is useful for update triggers:


create or replace trigger mailing_list_update
before update on mailing_list
for each row
when (new.name <> old.name)
begin
  -- user is changing his or her name
  -- record the fact in an audit table
  insert into mailing_list_name_changes
    (old_name, new_name)
  values
    (:old.name, :new.name);
end;
/
show errors

This time we used the create or replace syntax. This keeps us from having to drop trigger mailing_list_update if we want to change the trigger definition. We added a comment using the SQL comment shortcut "--". The syntax new. and old. is used in the trigger definition, limiting the conditions under which the trigger runs. Between the begin and end, we're in a PL/SQL block. This is Oracle's procedural language, described later, in which new.name would mean "the name element from the record in new". So you have to use :new instead. It is obscurities like this that lead to competent Oracle consultants being paid $200+ per hour.

The "/" and show errors at the end are instructions to Oracle's SQL*Plus program. The slash says "I'm done typing this piece of PL/SQL, please evaluate what I've typed." The "show errors" says "if you found anything to object to in what I just typed, please tell me".

Also consider the general_comments table:


create table general_comments (
	comment_id		integer primary key,
	on_what_id		integer not null,
	on_which_table		varchar(50),
	user_id			not null references users,
	comment_date		date not null,
	ip_address		varchar(50) not null,
	modified_date	date not null,
	content			clob,
	-- is the content in HTML or plain text (the default)
	html_p			char(1) default 'f' check(html_p in ('t','f')),
	approved_p		char(1) default 't' check(approved_p in ('t','f'))
);

Users and administrators are both able to edit comments. We want to make sure that we know when a comment was last modified so that we can offer the administrator a "recently modified comments page". Rather than painstakingly go through all of our Web scripts that insert or update comments, we can specify an invariant in Oracle that "after every time someone touches the general_comments table, make sure that the modified_date column is set equal to the current date-time." Here's the trigger definition:


create trigger general_comments_modified
before insert or update on general_comments
for each row
begin
 :new.modified_date := sysdate;
end;
/
show errors

We're using the PL/SQL programming language, discussed in the procedural language chapter. In this case, it is a simple begin-end block that sets the :new value of modified_date to the result of calling the sysdate function.

When using SQL*Plus, you have to provide a / character to get the program to evaluate a trigger or PL/SQL function definition. You then have to say "show errors" if you want SQL*Plus to print out what went wrong. Unless you expect to write perfect code all the time, it can be convenient to leave these SQL*Plus incantations in your .sql files.

An Audit Table Example

The canonical trigger example is the stuffing of an audit table. For example, in the data warehouse section of the ArsDigita Community System, we keep a table of user queries. Normally the SQL code for a query is kept in a query_columns table. However, sometimes a user might hand edit the generated SQL code, in which case we simply store that in the query_sqlqueries table. The SQL code for a query might be very important to a business and might have taken years to evolve. Even if we have good RDBMS backups, we don't want it getting erased because of a careless mouse click. So we add a queries_audit table to keep historical values of the query_sql column:


create table queries (
	query_id	integer primary key,
	query_name	varchar(100) not null,
	query_owner	not null references users,
	definition_time	date not null,
	-- if this is non-null, we just forget about all the query_columns
	-- stuff; the user has hand edited the SQL
	query_sql	varchar(4000)
);

create table queries_audit (
	query_id	integer not null,
	audit_time	date not null,
	query_sql	varchar(4000)
);

Note first that queries_audit has no primary key. If we were to make query_id the primary key, we'd only be able to store one history item per query, which is not our intent.

How to keep this table filled? We could do it by making sure that every Web script that might update the query_sql column inserts a row in queries_audit when appropriate. But how to enforce this after we've handed off our code to other programmers? Much better to let the RDBMS enforce the auditing:


create or replace trigger queries_audit_sql
before update on queries
for each row
when (old.query_sql is not null and (new.query_sql is null or old.query_sql <> new.query_sql))
begin
  insert into queries_audit (query_id, audit_time, query_sql)
  values
  (:old.query_id, sysdate, :old.query_sql);
end;

The structure of a row-level trigger is the following:


CREATE OR REPLACE TRIGGER ***trigger name***
***when*** ON ***which table***
FOR EACH ROW
***conditions for firing***
begin
  ***stuff to do***
end;

Let's go back and look at our trigger:

It is named queries_audit_sql; this is really of no consequence so long as it doesn't conflict with the names of other triggers.
It will be run before update, i.e., only when someone is executing an SQL UPDATE statement.
It will be run only when someone is updating the table queries.
It will be run only when the old value of query_sql is not null; we don't want to fill our audit table with NULLs.
It will be run only when the new value of query_sql is different from the old value; we don't want to fill our audit table with rows because someone happens to be updating another column in queries. Note that SQL's three-valued logic forces us to put in an extra test for new.query_sql is null because old.query_sql <> new.query_sql will not evaluate to true when new.query_sql is NULL (a user wiping out the custom SQL altogether; a very important case to audit).

Creating More Elaborate Constraints with Triggers

The default Oracle mechanisms for constraining data are not always adequate. For example, the ArsDigita Community System auction module has a table called au_categories. The category_keyword column is a unique shorthand way of referring to a category in a URL. However, this column may be NULL because it is not the primary key to the table. The shorthand method of referring to the category is optional.


create table au_categories (
        category_id             integer primary key,
        -- shorthand for referring to this category,
        -- e.g. "bridges", for use in URLs
        category_keyword        varchar(30),
        -- human-readable name of this category,
        -- e.g. "All types of bridges"
        category_name           varchar(128) not null
);

We can't add a UNIQUE constraint to the category_keyword column. That would allow the table to only have one row where category_keyword was NULL. So we add a trigger that can execute an arbitrary PL/SQL expression and raise an error to prevent an INSERT if necessary:


create or replace trigger au_category_unique_tr
before insert
on au_categories
for each row
declare
        existing_count integer;
begin
        select count(*) into existing_count from au_categories
            where category_keyword = :new.category_keyword;
        if existing_count > 0
        then
            raise_application_error(-20010, 'Category keywords must be unique if used');
        end if;
end;

This trigger queries the table to find out if there are any matching keywords already inserted. If there are, it calls the built-in Oracle procedure raise_application_error to abort the transaction.

Reference

Oracle Application Developer's Guide, Using Database Triggers

---

based on SQL for Web Nerds

Categories: developer (Audience)

Constraints

Created by Malte Sussdorff, last modified by Malte Sussdorff 25 Jul 2007, at 05:55 PM

Constraints

When you're defining a table, you can constrain single rows by adding some magic words after the data type:

not null; requires a value for this column
unique; two rows can't have the same value in this column (side effect in Oracle: creates an index)
primary key; same as unique except that no row can have a null value for this column and other tables can refer to this column
check; limit the range of values for column, e.g., rating integer check(rating > 0 and rating <= 10)
references; this column can only contain values present in another table's primary key column, e.g., user_id not null references users in the bboard table forces the user_id column to only point to valid users. An interesting twist is that you don't have to give a data type for user_id; Oracle assigns this column to whatever data type the foreign key has (in this case integer).

Constraints can apply to multiple columns:


create table static_page_authors (
	page_id		integer not null references static_pages,
	user_id		integer not null references users,
	notify_p	char(1) default 't' check (notify_p in ('t','f')),
	unique(page_id,user_id)
);

Oracle will let us keep rows that have the same page_id and rows that have the same user_id but not rows that have the same value in both columns (which would not make sense; a person can't be the author of a document more than once). Suppose that you run a university distinguished lecture series. You want speakers who are professors at other universities or at least PhDs. On the other hand, if someone controls enough money, be it his own or his company's, he's in. Oracle stands ready:


create table distinguished_lecturers (
	lecturer_id		integer primary key,
	name_and_title		varchar(100),
	personal_wealth		number,
	corporate_wealth	number,
	check (instr(upper(name_and_title),'PHD') <> 0 
               or instr(upper(name_and_title),'PROFESSOR') <> 0 
               or (personal_wealth + corporate_wealth) > 1000000000)
);

insert into distinguished_lecturers 
values
(1,'Professor Ellen Egghead',-10000,200000);

1 row created.

insert into distinguished_lecturers 
values
(2,'Bill Gates, innovator',75000000000,18000000000);

1 row created.

insert into distinguished_lecturers 
values
(3,'Joe Average',20000,0);

ORA-02290: check constraint (PHOTONET.SYS_C001819) violated

As desired, Oracle prevented us from inserting some random average loser into the distinguished_lecturers table, but the error message was confusing in that it refers to a constraint given the name of "SYS_C001819" and owned by the PHOTONET user. We can give our constraint a name at definition time:


create table distinguished_lecturers (
	lecturer_id		integer primary key,
	name_and_title		varchar(100),
	personal_wealth		number,
	corporate_wealth	number,
	constraint ensure_truly_distinguished
        check (instr(upper(name_and_title),'PHD') <> 0 
               or instr(upper(name_and_title),'PROFESSOR') <> 0 
               or (personal_wealth + corporate_wealth) > 1000000000)
);

insert into distinguished_lecturers 
values
(3,'Joe Average',20000,0);

ORA-02290: check constraint (PHOTONET.ENSURE_TRULY_DISTINGUISHED) violated

Now the error message is easier to understand by application programmers.

Categories: developer (Audience)

Tables

Created by Malte Sussdorff, last modified by Malte Sussdorff 25 Jul 2007, at 05:54 PM

THIS NEEDS TO BE AMMENDED FOR POSTGRESQL

The basics:


CREATE TABLE your_table_name (
       the_key_column	     key_data_type PRIMARY KEY,
       a_regular_column	     a_data_type,
       an_important_column   a_data_type NOT NULL,
       ... up to 996 intervening columns in Oracle8 ...
       the_last_column       a_data_type
);

Even in a simple example such as the one above, there are few items worth noting. First, I like to define the key column(s) at the very top. Second, the primary key constraint has some powerful effects. It forces the_key_column to be non-null. It causes the creation of an index on the_key_column, which will slow down updates to your_table_name but improve the speed of access when someone queries for a row with a particular value of the_key_column. Oracle checks this index when inserting any new row and aborts the transaction if there is already a row with the same value for the_key_column. Third, note that there is no comma following the definition of the last row. If you are careless and leave the comma in, Oracle will give you a very confusing error message.

If you didn't get it right the first time, you'll probably want to


alter table your_table_name add (new_column_name a_data_type any_constraints);


alter table your_table_name modify (existing_column_name new_data_type new_constraints);

In Oracle 8i you can drop a column:


alter table your_table_name drop column existing_column_name;

(see http://www.oradoc.com/keyword/drop_column).

If you're still in the prototype stage, you'll probably find it easier to simply


drop table your_table_name;

and recreate it. At any time, you can see what you've got defined in the database by querying Oracle's Data Dictionary:


SQL> select table_name from user_tables order by table_name;

TABLE_NAME
------------------------------
ADVS
ADV_CATEGORIES
ADV_GROUPS
ADV_GROUP_MAP
ADV_LOG
ADV_USER_MAP
AD_AUTHORIZED_MAINTAINERS
AD_CATEGORIES
AD_DOMAINS
AD_INTEGRITY_CHECKS
BBOARD
...
STATIC_CATEGORIES
STATIC_PAGES
STATIC_PAGE_AUTHORS
USERS
...

after which you will typically type

describe
table_name_of_interest

in SQL*Plus:


SQL> describe users;
 Name				 Null?	  Type
 ------------------------------- -------- ----
 USER_ID			 NOT NULL NUMBER(38)
 FIRST_NAMES			 NOT NULL VARCHAR2(100)
 LAST_NAME			 NOT NULL VARCHAR2(100)
 PRIV_NAME				  NUMBER(38)
 EMAIL				 NOT NULL VARCHAR2(100)
 PRIV_EMAIL				  NUMBER(38)
 EMAIL_BOUNCING_P			  CHAR(1)
 PASSWORD			 NOT NULL VARCHAR2(30)
 URL					  VARCHAR2(200)
 ON_VACATION_UNTIL			  DATE
 LAST_VISIT				  DATE
 SECOND_TO_LAST_VISIT			  DATE
 REGISTRATION_DATE			  DATE
 REGISTRATION_IP			  VARCHAR2(50)
 ADMINISTRATOR_P			  CHAR(1)
 DELETED_P				  CHAR(1)
 BANNED_P				  CHAR(1)
 BANNING_USER				  NUMBER(38)
 BANNING_NOTE				  VARCHAR2(4000)

Note that Oracle displays its internal data types rather than the ones you've given, e.g., number(38) rather than integer and varchar2 instead of the specified varchar.

References to this Page: en:sql-wn-data_modeling Categories: developer (Audience)

Data Types (Oracle)

Created by Malte Sussdorff, last modified by Malte Sussdorff 25 Jul 2007, at 05:51 PM

Data Types

For each column that you define for a table, you must specify the data type of that column. Here are your options:

Character Data

char(n) A fixed-length character string, e.g., char(200) will take up 200 bytes regardless of how long the string actually is. This works well when the data truly are of fixed size, e.g., when you are recording a user's sex as "m" or "f". This works badly when the data are of variable length. Not only does it waste space on the disk and in the memory cache, but it makes comparisons fail. For example, suppose you insert "rating" into a comment_type column of type char(30) and then your Tcl program queries the database. Oracle sends this column value back to procedural language clients padded with enough spaces to make up 30 total characters. Thus if you have a comparison within Tcl of whether $comment_type == "rating", the comparison will fail because $comment_type is actually "rating" followed by 24 spaces.

The maximum length char in Oracle8 is 2000 bytes.

varchar(n) A variable-length character string, up to 4000 bytes long in Oracle8. These are stored in such a way as to minimize disk space usage, i.e., if you only put one character into a column of type varchar(4000), Oracle only consumes two bytes on disk. The reason that you don't just make all the columns varchar(4000) is that the Oracle indexing system is limited to indexing keys of about 700 bytes.

clob A variable-length character string, up to 4 gigabytes long in Oracle8. The CLOB data type is useful for accepting user input from such applications as discussion forums. Sadly, Oracle8 has tremendous limitations on how CLOB data may be inserted, modified, and queried. Use varchar(4000) if you can and prepare to suffer if you can't.
In a spectacular demonstration of what happens when companies don't follow the lessons of The Mythical Man Month, the regular string functions don't work on CLOBs. You need to call identically named functions in the DBMS_LOB package. These functions take the same arguments but in different orders. You'll never be able to write a working line of code without first reading the DBMS_LOB section of the Oracle8 Server Application Developer's Guide.

nchar, nvarchar, nclob The n prefix stands for "national character set". These work like char, varchar, and clob but for multi-byte characters (e.g., Unicode; see http://www.unicode.org).

Numeric Data

number Oracle actually only has one internal data type that is used for storing numbers. It can handle 38 digits of precision and exponents from -130 to +126. If you want to get fancy, you can specify precision and scale limits. For example, number(3,0) says "round everything to an integer [scale 0] and accept numbers than range from -999 to +999". If you're American and commercially minded, number(9,2) will probably work well for storing prices in dollars and cents (unless you're selling stuff to Bill Gates, in which case the billion dollar limit imposed by the precision of 9 might prove constraining). If you are Bill Gates, you might not want to get distracted by insignificant numbers: Tell Oracle to round everything to the nearest million with number(38,-6).

integer In terms of storage consumed and behavior, this is not any different from number(38) but I think it reads better and it is more in line with ANSI SQL (which would be a standard if anyone actually implemented it).

Dates and Date/Time Intervals (Version 9i and newer)

timestamp A point in time, recorded with sub-second precision. When creating a column you specify the number of digits of precision beyond one second from 0 (single second precision) to 9 (nanosecond precision). Oracle's calendar can handle dates between between January 1, 4712 BC and December 31, 9999 AD. You can put in values with the to_timestamp function and query them out using the to_char function. Oracle offers several variants of this datatype for coping with data aggregated across multiple timezones.

interval year to month An amount of time, expressed in years and months.
interval day to second An amount of time, expressed in days, hours, minutes, and seconds. Can be precise down to the nanosecond if desired.

Dates and Date/Time Intervals (Versions 8i and earlier)

date Obsolete as of version 9i. A point in time, recorded with one-second precision, between January 1, 4712 BC and December 31, 4712 AD. You can put in values with the to_date function and query them out using the to_char function. If you don't use these functions, you're limited to specifying the date with the default system format mask, usually 'DD-MON-YY'. This is a good recipe for a Year 2000 bug since January 23, 2000 would be '23-JAN-00'. On better-maintained systems, this is often the ANSI default: 'YYYY-MM-DD', e.g., '2000-01-23' for January 23, 2000.

number Hey, isn't this a typo? What's number doing in the date section? It is here because this is how Oracle versions prior to 9i represented date-time intervals, though their docs never say this explicitly. If you add numbers to dates, you get new dates. For example, tomorrow at exactly this time is sysdate+1. To query for stuff submitted in the last hour, you limit to submitted_date > sysdate - 1/24.
Binary Data

blob BLOB stands for "Binary Large OBject". It doesn't really have to be all that large, though Oracle will let you store up to 4 GB. The BLOB data type was set up to permit the storage of images, sound recordings, and other inherently binary data. In practice, it also gets used by fraudulent application software vendors. They spend a few years kludging together some nasty format of their own. Their MBA executive customers demand that the whole thing be RDBMS-based. The software vendor learns enough about Oracle to "stuff everything into a BLOB". Then all the marketing and sales folks are happy because the application is now running from Oracle instead of from the file system. Sadly, the programmers and users don't get much because you can't use SQL very effectively to query or update what's inside a BLOB.

bfile A binary file, stored by the operating system (typically Unix) and kept track of by Oracle. These would be useful when you need to get to information both from SQL (which is kept purposefully ignorant about what goes on in the wider world) and from an application that can only read from standard files (e.g., a typical Web server). The bfile data type is pretty new but to my mind it is already obsolete: Oracle 8.1 (8i) lets external applications view content in the database as though it were a file on a Windows NT server. So why not keep everything as a BLOB and enable Oracle's Internet File System?

Character Data
char(n)	A fixed-length character string, e.g., `char(200)` will take up 200 bytes regardless of how long the string actually is. This works well when the data truly are of fixed size, e.g., when you are recording a user's sex as "m" or "f". This works badly when the data are of variable length. Not only does it waste space on the disk and in the memory cache, but it makes comparisons fail. For example, suppose you insert "rating" into a `comment_type` column of type `char(30)` and then your Tcl program queries the database. Oracle sends this column value back to procedural language clients padded with enough spaces to make up 30 total characters. Thus if you have a comparison within Tcl of whether `$comment_type == "rating"`, the comparison will fail because `$comment_type` is actually "rating" followed by 24 spaces. The maximum length char in Oracle8 is 2000 bytes.
varchar(n)	A variable-length character string, up to 4000 bytes long in Oracle8. These are stored in such a way as to minimize disk space usage, i.e., if you only put one character into a column of type `varchar(4000)`, Oracle only consumes two bytes on disk. The reason that you don't just make all the columns `varchar(4000)` is that the Oracle indexing system is limited to indexing keys of about 700 bytes.
clob	A variable-length character string, up to 4 gigabytes long in Oracle8. The CLOB data type is useful for accepting user input from such applications as discussion forums. Sadly, Oracle8 has tremendous limitations on how CLOB data may be inserted, modified, and queried. Use `varchar(4000)` if you can and prepare to suffer if you can't. In a spectacular demonstration of what happens when companies don't follow the lessons of The Mythical Man Month, the regular string functions don't work on CLOBs. You need to call identically named functions in the DBMS_LOB package. These functions take the same arguments but in different orders. You'll never be able to write a working line of code without first reading the DBMS_LOB section of the Oracle8 Server Application Developer's Guide.
nchar, nvarchar, nclob	The n prefix stands for "national character set". These work like char, varchar, and clob but for multi-byte characters (e.g., Unicode; see http://www.unicode.org).
Numeric Data
number	Oracle actually only has one internal data type that is used for storing numbers. It can handle 38 digits of precision and exponents from -130 to +126. If you want to get fancy, you can specify precision and scale limits. For example, `number(3,0)` says "round everything to an integer [scale 0] and accept numbers than range from -999 to +999". If you're American and commercially minded, `number(9,2)` will probably work well for storing prices in dollars and cents (unless you're selling stuff to Bill Gates, in which case the billion dollar limit imposed by the precision of 9 might prove constraining). If you are Bill Gates, you might not want to get distracted by insignificant numbers: Tell Oracle to round everything to the nearest million with `number(38,-6)`.
integer	In terms of storage consumed and behavior, this is not any different from `number(38)` but I think it reads better and it is more in line with ANSI SQL (which would be a standard if anyone actually implemented it).
Dates and Date/Time Intervals (Version 9i and newer)
timestamp	A point in time, recorded with sub-second precision. When creating a column you specify the number of digits of precision beyond one second from 0 (single second precision) to 9 (nanosecond precision). Oracle's calendar can handle dates between between January 1, 4712 BC and December 31, 9999 AD. You can put in values with the `to_timestamp` function and query them out using the `to_char` function. Oracle offers several variants of this datatype for coping with data aggregated across multiple timezones.
interval year to month	An amount of time, expressed in years and months.
interval day to second	An amount of time, expressed in days, hours, minutes, and seconds. Can be precise down to the nanosecond if desired.
Dates and Date/Time Intervals (Versions 8i and earlier)
date	Obsolete as of version 9i. A point in time, recorded with one-second precision, between January 1, 4712 BC and December 31, 4712 AD. You can put in values with the `to_date` function and query them out using the `to_char` function. If you don't use these functions, you're limited to specifying the date with the default system format mask, usually 'DD-MON-YY'. This is a good recipe for a Year 2000 bug since January 23, 2000 would be '23-JAN-00'. On better-maintained systems, this is often the ANSI default: 'YYYY-MM-DD', e.g., '2000-01-23' for January 23, 2000.
number	Hey, isn't this a typo? What's `number` doing in the date section? It is here because this is how Oracle versions prior to 9i represented date-time intervals, though their docs never say this explicitly. If you add numbers to dates, you get new dates. For example, tomorrow at exactly this time is `sysdate+1`. To query for stuff submitted in the last hour, you limit to `submitted_date > sysdate - 1/24`.
Binary Data
blob	BLOB stands for "Binary Large OBject". It doesn't really have to be all that large, though Oracle will let you store up to 4 GB. The BLOB data type was set up to permit the storage of images, sound recordings, and other inherently binary data. In practice, it also gets used by fraudulent application software vendors. They spend a few years kludging together some nasty format of their own. Their MBA executive customers demand that the whole thing be RDBMS-based. The software vendor learns enough about Oracle to "stuff everything into a BLOB". Then all the marketing and sales folks are happy because the application is now running from Oracle instead of from the file system. Sadly, the programmers and users don't get much because you can't use SQL very effectively to query or update what's inside a BLOB.
bfile	A binary file, stored by the operating system (typically Unix) and kept track of by Oracle. These would be useful when you need to get to information both from SQL (which is kept purposefully ignorant about what goes on in the wider world) and from an application that can only read from standard files (e.g., a typical Web server). The bfile data type is pretty new but to my mind it is already obsolete: Oracle 8.1 (8i) lets external applications view content in the database as though it were a file on a Windows NT server. So why not keep everything as a BLOB and enable Oracle's Internet File System?

Despite this plethora of data types, Oracle has some glaring holes that torture developers. For example, there is no Boolean data type. A developer who needs an approved_p column is forced to use char(1) check(this_column in ('t','f')) and then, instead of the clean query where approved_p is forced into where approved_p = 't'.

Oracle8 includes a limited ability to create your own data types. Covering these is beyond the scope of this book. See Oracle8 Server Concepts, User-Defined Datatypes.

References to this Page: en:sql-wn-data_modeling Categories: developer (Audience)

The Discussion Forum -- philg's personal odyssey

Created by Malte Sussdorff, last modified by Malte Sussdorff 25 Jul 2007, at 05:50 PM

Back in 1995, I built a threaded discussion forum, described ad nauseum in http://philip.greenspun.com/wtr/dead-trees/53013.htm. Here's how I stored the postings:


create table bboard (
        msg_id          char(6) not null primary key,
        refers_to       char(6),
        email           varchar(200),
        name            varchar(200),
        one_line        varchar(700),
        message         clob,
        notify          char(1) default 'f' check (notify in ('t','f')),
        posting_time    date,
        sort_key        varchar(600)
);

German order reigns inside the system itself: messages are uniquely keyed with msg_id, refer to each other (i.e., say "I'm a response to msg X") with refers_to, and a thread can be displayed conveniently by using the sort_key column.

Italian chaos is permitted in the email and name columns; users could remain anonymous, masquerade as "president@whitehouse.gov" or give any name.

This seemed like a good idea when I built the system. I was concerned that it work reliably. I didn't care whether or not users put in bogus content; the admin pages made it really easy to remove such postings and, in any case, if someone had something interesting to say but needed to remain anonymous, why should the system reject their posting?

One hundred thousand postings later, as the moderator of the photo.net Q&A forum, I began to see the dimensions of my data modeling mistakes.

First, anonymous postings and fake email addresses didn't come from Microsoft employees revealing the dark truth about their evil bosses. They came from complete losers trying and failing to be funny or wishing to humiliate other readers. Some fake addresses came from people scared by the rising tide of spam email (not a serious problem back in 1995).

Second, I didn't realize how the combination of my email alert systems, fake email addresses, and Unix mailers would result in my personal mailbox filling up with messages that couldn't be delivered to "asdf@asdf.com" or "duh@duh.net".

Although the solution involved changing some Web scripts, fundamentally the fix was add a column to store the IP address from which a post was made:


alter table bboard add (originating_ip	varchar(16));

Keeping these data enabled me to see that most of the anonymous posters were people who'd been using the forum for some time, typically from the same IP address. I just sent them mail and asked them to stop, explaining the problem with bounced email.

After four years of operating the photo.net community, it became apparent that we needed ways to

display site history for users who had changed their email addresses
discourage problem users from burdening the moderators and the community
carefully tie together user-contributed content in the various subsystems of photo.net

The solution was obvious to any experienced database nerd: a canonical users table and then content tables that reference it. Here's a simplified version of the data model, taken from a toolkit for building online communities, describe in http://philip.greenspun.com/panda/community:


create table users (
	user_id			integer not null primary key,
	first_names		varchar(100) not null,
	last_name		varchar(100) not null,
	email			varchar(100) not null unique,
        ..
);

create table bboard (
	msg_id		char(6) not null primary key,
	refers_to	char(6),
	topic		varchar(100) not null references bboard_topics,
	category	varchar(200),	-- only used for categorized Q&A forums
	originating_ip	varchar(16),	-- stored as string, separated by periods
	user_id		integer not null references users,
	one_line	varchar(700),
	message		clob,
	-- html_p - is the message in html or not
	html_p		char(1) default 'f' check (html_p in ('t','f')),
        ...
);

create table classified_ads (
	classified_ad_id	integer not null primary key,
	user_id			integer not null references users,
	...
);

Note that a contributor's name and email address no longer appear in the bboard table. That doesn't mean we don't know who posted a message. In fact, this data model can't even represent an anonymous posting: user_id integer not null references users requires that each posting be associated with a user ID and that there actually be a row in the users table with that ID.

First, let's talk about how much fun it is to move a live-on-the-Web 600,000 hit/day service from one data model to another. In this case, note that the original bboard data model had a single name column. The community system has separate columns for first and last names. A conversion script can easily split up "Joe Smith" but what is it to do with William Henry Gates III?

How do we copy over anonymous postings? Remember that Oracle is not flexible or intelligent. We said that we wanted every row in the bboard table to reference a row in the users table. Oracle will abort any transaction that would result in a violation of this integrity constraint. So we either have to drop all those anonymous postings (and any non-anonymous postings that refer to them) or we have to create a user called "Anonymous" and assign all the anonymous postings to that person. The technical term for this kind of solution is kludge.

A more difficult problem than anonymous postings is presented by long-time users who have difficulty typing and or keeping a job. Consider a user who has identified himself as

Joe Smith; jsmith@ibm.com
Jo Smith; jsmith@ibm.com (typo in name)
Joseph Smith; jsmth@ibm.com (typo in email)
Joe Smith; cantuseworkaddr@hotmail.com (new IBM policy)
Joe Smith-Jones; joe_smithjones@hp.com (got married, changed name, changed jobs)
Joe Smith-Jones; jsmith@somedivision.hp.com (valid but not canonical corporate email address)
Josephina Smith; jsmith@somedivision.hp.com (sex change; divorce)
Josephina Smith; josephina_smith@hp.com (new corporate address)
Siddhartha Bodhisattva; josephina_smith@hp.com (change of philosophy)
Siddhartha Bodhisattva; thinkwaitfast@hotmail.com (traveling for awhile to find enlightenment)

Contemporary community members all recognize these postings as coming from the same person but it would be very challenging even to build a good semi-automated means of merging postings from this person into one user record.

Once we've copied everything into this new normalized data model, notice that we can't dig ourselves into the same hole again. If a user has contributed 1000 postings, we don't have 1000 different records of that person's name and email address. If a user changes jobs, we need only update one column in one row in one table.

The html_p column in the new data model is worth mentioning. In 1995, I didn't understand the problems of user-submitted data. Some users will submit plain text, which seems simple, but in fact you can't just spit this out as HTML. If user A typed < or > characters, they might get swallowed by user B's Web browser. Does this matter? Consider that "<g>" is interpreted in various online circles as an abbreviation for "grin" but by Netscape Navigator as an unrecognized (and therefore ignore) HTML tag. Compare the meaning of

"We shouldn't think it unfair that Bill Gates has more wealth than the 100 million poorest Americans combined. After all, he invented the personal computer, the graphical user interface, and the Internet."

with

"We shouldn't think it unfair that Bill Gates has more wealth than the 100 million poorest Americans combined. After all, he invented the personal computer, the graphical user interface, and the Internet. <g>"

It would have been easy enough for me to make sure that such characters never got interpreted as markup. In fact, with AOLserver one can do it with a single call to the built-in procedure ns_quotehtml. However, consider the case where a nerd posts some HTML. Other users would then see

"For more examples of my brilliant thinking and modesty, check out <a href="http://philip.greenspun.com/">my home page</a>."

I discovered that the only real solution is to ask the user whether the submission is an HTML fragment or plain text, show the user an approval page where the content may be previewed, and then remember what the user told us in an html_p column in the database.

Is this data model perfect? Permanent? Absolutely. It will last for at least... Whoa! Wait a minute. I didn't know that Dave Clark was replacing his original Internet Protocol, which the world has been running since around 1980, with IPv6 (http://www.faqs.org/rfcs/rfc2460.html). In the near future, we'll have IP addresses that are 128 bits long. That's 16 bytes, each of which takes two hex characters to represent. So we need 32 characters plus at least 7 more for periods that separate the hex digits. We might also need a couple of characters in front to say "this is a hex representation". Thus our brand new data model in fact has a crippling deficiency. How easy is it to fix? In Oracle:


alter table bboard modify (originating_ip varchar(50));

You won't always get off this easy. Oracle won't let you shrink a column from a maximum of 50 characters to 16, even if no row has a value longer than 16 characters. Oracle also makes it tough to add a column that is constrained not null.

References to this Page: en:sql-wn-data_modeling Categories: developer (Audience)

Transactions

Created by Anett Szabo, last modified by Anett Szabo 25 Jul 2007, at 04:32 PM

In the introduction we covered some examples of inserting data into a database by typing at SQL*Plus:


insert into mailing_list (name, email)
values ('Philip Greenspun','philg@mit.edu');

Generally, this is not how it is done. As a programmer, you write code that gets executed every time a user submits a discussion forum posting or classified ad. The structure of the SQL statement remains fixed but not the string literals after the values.

The simplest and most direct interface to a relational database involves a procedural program in C, Java, Lisp, Perl, or Tcl putting together a string of SQL that is then sent to to the RDBMS. Here's how the ArsDigita Community System constructs a new entry in the clickthrough log:


insert into clickthrough_log 
 (local_url, foreign_url, entry_date, click_count)
values
 ('$local_url', '$foreign_url', trunc(sysdate), 1)"

The INSERT statement adds one row, filling in the four list columns. Two of the values come from local variables set within the Web server, $local_url and $foreign_url. Because these are strings, they must be surrounded by single quotes. One of the values is dynamic and comes straight from Oracle: trunc(sysdate). Recall that the date data type in Oracle is precise to the second. We only want one of these rows per day of the year and hence truncate the date to midnight. Finally, as this is the first clickthrough of the day, we insert a constant value of 1 for click_count.

Atomicity

Each SQL statement executes as an atomic transaction. For example, suppose that you were to attempt to purge some old data with


delete from clickthrough_log where entry_date + 120 < sysdate;

(delete clickthrough records more than 120 days old) and that 3500 rows in clickthrough_log are older than 120 days. If your computer failed halfway through the execution of this DELETE, i.e., before the transaction committed, you would find that none of the rows had been deleted. Either all 3500 rows will disappear or none will.

More interestingly, you can wrap a transaction around multiple SQL statements. For example, when a user is editing a comment, the ArsDigita Community System keeps a record of what was there before:


ns_db dml $db "begin transaction" 

# insert into the audit table
ns_db dml $db "insert into general_comments_audit
 (comment_id, user_id, ip_address, audit_entry_time, modified_date, content)
select comment_id, 
       user_id, 
       '[ns_conn peeraddr]', 
       sysdate, 
       modified_date, 
       content from general_comments 
where comment_id = $comment_id"

# change the publicly viewable table
ns_db dml $db "update general_comments
set content = '$QQcontent', 
    html_p = '$html_p'
where comment_id = $comment_id"

# commit the transaction
ns_db dml $db "end transaction"

This is generally referred to in the database industry as auditing. The database itself is used to keep track of what has been changed and by whom.

Let's look at these sections piece by piece. We're looking at a Tcl program calling AOLserver API procedures when it wants to talk to Oracle. We've configured the system to reverse the normal Oracle world order in which everything is within a transaction unless otherwise committed. The begin transaction and end transaction statements never get through to Oracle; they are merely instructions to our Oracle driver to flip Oracle out and then back into autocommit mode.

The transaction wrapper is imposed around two SQL statements. The first statement inserts a row into general_comments_audit. We could simply query the general_comments table from Tcl and then use the returned data to create a standard-looking INSERT. However, if what you're actually doing is moving data from one place within the RDBMS to another, it is extremely bad taste to drag it all the way out to an application program and then stuff it back in. Much better to use the "INSERT ... SELECT" form.

Note that two of the columns we're querying from general_comments don't exist in the table: sysdate and '[ns_conn peeraddr]'. It is legal in SQL to put function calls or constants in your select list, just as you saw at the beginning of the Queries chapter where we discussed Oracle's one-row system table: dual. To refresh your memory:


select sysdate from dual;

SYSDATE
----------
1999-01-14

You can compute multiple values in a single query:


select sysdate, 2+2, atan2(0, -1) from dual;

SYSDATE 	  2+2 ATAN2(0,-1)
---------- ---------- -----------
1999-01-14	    4  3.14159265

This approach is useful in the transaction above, where we combine information from a table with constants and function calls. Here's a simpler example:


select posting_time, 2+2 
from bboard 
where msg_id = '000KWj';

POSTING_TI	  2+2
---------- ----------
1998-12-13	    4

Let's get back to our comment editing transaction and look at the basic structure:

open a transaction
insert into an audit table whatever comes back from a SELECT statement on the comment table
update the comment table
close the transaction

Suppose that something goes wrong during the INSERT. The tablespace in which the audit table resides is full and it isn't possible to add a row. Putting the INSERT and UPDATE in the same RDBMS transactions ensures that if there is a problem with one, the other won't be applied to the database.

Consistency

Suppose that we've looked at a message on the bulletin board and decide that its content is so offensive we wish to delete the user from our system:


select user_id from bboard where msg_id = '000KWj';

   USER_ID
----------
     39685

delete from users where user_id = 39685;
*
ERROR at line 1:
ORA-02292: integrity constraint (PHOTONET.SYS_C001526) violated - child record
found

Oracle has stopped us from deleting user 39685 because to do so would leave the database in an inconsistent state. Here's the definition of the bboard table:


create table bboard (
	msg_id		char(6) not null primary key,
	refers_to	char(6),
        ...
	user_id		integer not null references users,
	one_line	varchar(700),
	message		clob,
	...
);

The user_id column is constrained to be not null. Furthermore, the value in this column must correspond to some row in the users table (references users). By asking Oracle to delete the author of msg_id 000KWj from the users table before we deleted all of his or her postings from the bboard table, we were asking Oracle to leave the RDBMS in an inconsistent state.

Mutual Exclusion

When you have multiple simultaneously executing copies of the same program, you have to think about mutual exclusion. If a program has to

read a value from the database
perform a computation based on that value
update the value in the database based on the computation

Then you want to make sure only one copy of the program is executing at a time through this segment.

The /bboard module of the ArsDigita Community System has to do this. The sequence is

read the last message ID from the msg_id_generator table
increment the message ID with a bizarre collection of Tcl scripts
update the last_msg_id column in the msg_id_generator table

First, anything having to do with locks only makes sense when the three operations are grouped together in a transaction. Second, to avoid deadlocks a transaction must acquire all the resources (including locks) that it needs at the start of the transaction. A SELECT in Oracle does not acquire any locks but a SELECT .. FOR UPDATE does. Here's the beginning of the transaction that inserts a message into the bboard table (from /bboard/insert-msg.tcl):


select last_msg_id
from msg_id_generator
for update of last_msg_id

Mutual Exclusion (the Big Hammer)

The for update clause isn't a panacea. For example, in the Action Network (described in Chapter 16 of Philip and Alex's Guide to Web Publishing), we need to make sure that a double-clicking user doesn't generate duplicate FAXes to politicians. The test to see if the user has already responded is


select count(*) from an_alert_log 
where member_id = $member_id
and entry_type = 'sent_response'
and alert_id = $alert_id

By default, Oracle locks one row at a time and doesn't want you to throw a FOR UPDATE clause into a SELECT COUNT(*). The implication of that would be Oracle recording locks on every row in the table. Much more efficient is simply to start the transaction with


lock table an_alert_log in exclusive mode

This is a big hammer and you don't want to hold a table lock for more than an instant. So the structure of a page that gets a table lock should be

open a transaction
lock table
select count(*)
if the count was 0, insert a row to record the fact that the user has responded
commit the transaction (releases the table lock)
proceed with the rest of the script
...

What if I just want some unique numbers?

Does it really have to be this hard? What if you just want some unique integers, each of which will be used as a primary key? Consider a table to hold news items for a Web site:


create table news (
	title		varchar(100) not null,
	body		varchar(4000) not null,
	release_date	date not null,
        ...
);

You might think you could use the title column as a key, but consider the following articles:


insert into news (title, body, release_date)
values
('French Air Traffic Controllers Strike',
 'A walkout today by controllers left travelers stranded..',
 '1995-12-14');

insert into news (title, body, release_date)
values
('French Air Traffic Controllers Strike',
 'Passengers at Orly faced 400 canceled flights ...',
 '1997-05-01');

insert into news (title, body, release_date)
values
('Bill Clinton Beats the Rap',
 'Only 55 senators were convinced that President Clinton obstructed justice ...',
 '1999-02-12');

insert into news (title, body, release_date)
values
('Bill Clinton Beats the Rap',
 'The sexual harassment suit by Paula Jones was dismissed ...',
 '1998-12-02);

It would seem that, at least as far as headlines are concerned, little of what is reported is truly new. Could we add

	primary key (title, release_date)

at the end of our table definition? Absolutely. But keying by title and date would result in some unwieldy URLs for editing or approving news articles. If your site allows public suggestions, you might find submissions from multiple users colliding. If you accept comments on news articles, a standard feature of the ArsDigita Community System, each comment must reference a news article. You'd have to be sure to update both the comments table and the news table if you needed to correct a typo in the title column or changed the release_date.

The traditional database design that gets around all of these problems is the use of a generated key. If you've been annoyed by having to carry around your student ID at MIT or your patient ID at a hospital, now you understand the reason why: the programmers are using generated keys and making their lives a bit easier by exposing this part of their software's innards.

Here's how the news module of the ArsDigita Community System works, in an excerpt from http://software.arsdigita.com/www/doc/sql/news.sql:


create sequence news_id_sequence start with 1;

create table news (
	news_id		integer primary key,
	title		varchar(100) not null,
	body		varchar(4000) not null,
	release_date	date not null,
	...
);

We're taking advantage of the nonstandard but very useful Oracle sequence facility. In almost any Oracle SQL statement, you can ask for a sequence's current value or next value.


SQL> create sequence foo_sequence;

Sequence created.

SQL> select foo_sequence.currval from dual;

ERROR at line 1:
ORA-08002: sequence FOO_SEQUENCE.CURRVAL is not yet defined in this session

Oops! Looks like we can't ask for the current value until we've asked for at least one key in our current session with Oracle.


SQL> select foo_sequence.nextval from dual;

   NEXTVAL
----------
	 1

SQL> select foo_sequence.nextval from dual;

   NEXTVAL
----------
	 2

SQL> select foo_sequence.nextval from dual;

   NEXTVAL
----------
	 3

SQL> select foo_sequence.currval from dual;

   CURRVAL
----------
	 3

You can use the sequence generator directly in an insert, e.g.,


insert into news (news_id, title, body, release_date)
values
(news_id_sequence.nextval, 
 'Tuition Refund at MIT',
 'Administrators were shocked and horrified ...',
 '1998-03-12);

Background on this story: http://philip.greenspun.com/school/tuition-free-mit.html

In the ArsDigita Community System implementation, the news_id is actually generated in /news/post-new-2.tcl:


set news_id [database_to_tcl_string $db "select news_id_sequence.nextval from dual"]

This way the page that actually does the database insert, /news/post-new-3.tcl, can be sure when the user has inadvertently hit submit twice:


if [catch { ns_db dml $db "insert into news
(news_id, title, body, html_p, approved_p, 
 release_date, expiration_date, creation_date, creation_user, 
 creation_ip_address)
values
($news_id, '$QQtitle', '$QQbody', '$html_p', '$approved_p',
 '$release_date', '$expiration_date', sysdate, $user_id, 
 '$creation_ip_address')" } errmsg] {
    # insert failed; let's see if it was because of duplicate submission
    if { [database_to_tcl_string $db "select count(*) 
                                      from news
                                      where news_id = $news_id"] == 0 } {
        # some error other than dupe submission
	ad_return_error "Insert Failed" "The database ..."
        return
    }
    # we don't bother to handle the cases where there is a dupe submission
    # because the user should be thanked or redirected anyway
}

In our experience, the standard technique of generating the key at the same time as the insert leads to a lot of duplicate information in the database.

Sequence Caveats

Oracle sequences are optimized for speed. Hence they offer the minimum guarantees that Oracle thinks are required for primary key generation and no more.

If you ask for a few nextvals and roll back your transaction, the sequence will not be rolled back.

You can't rely on sequence values to be, uh, sequential. They will be unique. They will be monotonically increasing. But there might be gaps. The gaps arise because Oracle pulls, by default, 20 sequence values into memory and records those values as used on disk. This makes nextval very fast since the new value need only be marked use in RAM and not on disk. But suppose that someone pulls the plug on your database server after only two sequence values have been handed out. If your database administrator and system administrator are working well together, the computer will come back to life running Oracle. But there will be a gap of 18 values in the sequence (e.g., from 2023 to 2041). That's because Oracle recorded 20 values used on disk and only handed out 2.

Oracle8 Server Application Developer's Guide, Controlling Transactions
Orace8 Server SQL Reference, CREATE SEQUENCE section

---

based on SQL for Web Nerds

Categories: beginner (Audience)

Views

Created by Anett Szabo, last modified by Anett Szabo 25 Jul 2007, at 04:31 PM

The relational database provides programmers with a high degree of abstraction from the physical world of the computer. You can't tell where on the disk the RDBMS is putting each row of a table. For all you know, information in a single row might be split up and spread out across multiple disk drives. The RDBMS lets you add a column to a billion-row table. Is the new information for each row going to be placed next to the pre-existing columns or will a big new block of disk space be allocated to hold the new column value for all billion rows? You can't know and shouldn't really care.

A view is a way of building even greater abstraction.

Suppose that Jane in marketing says that she wants to see a table containing the following information:

user_id
email address
number of static pages viewed
number of bboard postings made
number of comments made

This information is spread out among four tables. However, having read the preceding chapters of this book, you're perfectly equipped to serve Jane's needs with the following query:

select u.user_id, 
       u.email, 
       count(ucm.page_id) as n_pages,
       count(bb.msg_id) as n_msgs,
       count(c.comment_id) as n_comments
from users u, user_content_map ucm, bboard bb, comments c
where u.user_id = ucm.user_id(+)
and u.user_id = bb.user_id(+)
and u.user_id = c.user_id(+)
group by u.user_id, u.email
order by upper(email)

Then Jane adds "I want to see this every day, updated with the latest information. I want to have a programmer write me some desktop software that connects directly to the database and looks at this information; I don't want my desktop software breaking if you reorganize the data model."

create or replace view janes_marketing_view 
as 
select u.user_id, 
       u.email, 
       count(ucm.page_id) as n_pages,
       count(bb.msg_id) as n_msgs,
       count(c.comment_id) as n_comments
from users u, user_content_map ucm, bboard bb, comments c
where u.user_id = ucm.user_id(+)
and u.user_id = bb.user_id(+)
and u.user_id = c.user_id(+)
group by u.user_id, u.email
order by upper(u.email)

To Jane, this will look and act just like a table when she queries it:


select * from janes_marketing_view;

Why should she need to be aware that information is coming from four tables? Or that you've reorganized the RDBMS so that the information subsequently comes from six tables?

Protecting Privacy with Views

A common use of views is protecting confidential data. For example, suppose that all the people who work in a hospital collaborate by using a relational database. Here is the data model:


create table patients (
       patient_id     integer primary key,
       patient_name   varchar(100),
       hiv_positive_p char(1),
       insurance_p    char(1),
       ...
);

If a bunch of hippie idealists are running the hospital, they'll think that the medical doctors shouldn't be aware of a patient's insurance status. So when a doc is looking up a patient's medical record, the looking is done through


create view patients_clinical
as 
select patient_id, patient_name, hiv_positive_p from patients;

The folks over in accounting shouldn't get access to the patients' medical records just because they're trying to squeeze money out of them:


create view patients_accounting
as 
select patient_id, patient_name, insurance_p from patients;

Relational databases have elaborate permission systems similar to those on time-shared computer systems. Each person in a hospital has a unique database user ID. Permission will be granted to view or modify certain tables on a per-user or per-group-of-users basis. Generally the RDBMS permissions facilities aren't very useful for Web applications. It is the Web server that is talking to the database, not a user's desktop computer. So the Web server is responsible for figuring out who is requesting a page and how much to show in response.

Protecting Your Own Source Code

The ArsDigita Shoppe system, described in http://philip.greenspun.com/panda/ecommerce, represents all orders in one table, whether they were denied by the credit card processor, returned by the user, or voided by the merchant. This is fine for transaction processing but you don't want your accounting or tax reports corrupted by the inclusion of failed orders. You can make a decision in one place as to what constitutes a reportable order and then have all of your report programs query the view:


create or replace view sh_orders_reportable 
as
select * from sh_orders
where order_state not in ('confirmed','failed_authorization','void');

Note that in the privacy example (above) we were using the view to leave unwanted columns behind whereas here we are using the view to leave behind unwanted rows.

If we add some order states or otherwise change the data model, the reporting programs need not be touched; we only have to keep this view definition up to date. Note that you can define every view with "create or replace view" rather than "create view"; this saves a bit of typing when you have to edit the definition later.

If you've used select * to define a view and subsequently alter any of the underlying tables, you have to redefine the view. Otherwise, your view won't contain any of the new columns. You might consider this a bug but Oracle has documented it, thus turning the behavior into a feature.

Views-on-the-fly and OUTER JOIN

Let's return to our first OUTER JOIN example, from the simple queries chapter:


select users.user_id, users.email, classified_ads.posted
from users, classified_ads
where users.user_id = classified_ads.user_id(+)
order by users.email, posted;

...
   USER_ID EMAIL			       POSTED
---------- ----------------------------------- ----------
     52790 dbrager@mindspring.com
     37461 dbraun@scdt.intel.com
     52791 dbrenner@flash.net
     47177 dbronz@free.polbox.pl
     37296 dbrouse@enter.net
     47178 dbrown@cyberhighway.net
     36985 dbrown@uniden.com		       1998-03-05
     36985 dbrown@uniden.com		       1998-03-10
     34283 dbs117@amaze.net
     52792 dbsikorski@yahoo.com
...

The plus sign after classified_ads.user_id is our instruction to Oracle to "add NULL rows if you can't meet this JOIN constraint".

Suppose that this report has gotten very long and we're only interested in users whose email addresses start with "db". We can add a WHERE clause constraint on the email column of the users table:


select users.user_id, users.email, classified_ads.posted
from users, classified_ads
where users.user_id = classified_ads.user_id(+)
and users.email like 'db%'
order by users.email, posted;

   USER_ID EMAIL			  POSTED
---------- ------------------------------ ----------
     71668 db-designs@emeraldnet.net
    112295 db1@sisna.com
    137640 db25@umail.umd.edu
     35102 db44@aol.com 		  1999-12-23
     59279 db4rs@aol.com
     95190 db@astro.com.au
     17474 db@hotmail.com
    248220 db@indianhospitality.com
     40134 db@spindelvision.com 	  1999-02-04
    144420 db_chang@yahoo.com
     15020 dbaaru@mindspring.com
...

Suppose that we decide we're only interested in classified ads since January 1, 1999. Let's try the naive approach, adding another WHERE clause constraint, this time on a column from the classified_ads table:


select users.user_id, users.email, classified_ads.posted
from users, classified_ads
where users.user_id = classified_ads.user_id(+)
and users.email like 'db%'
and classified_ads.posted > '1999-01-01'
order by users.email, posted;

   USER_ID EMAIL			  POSTED
---------- ------------------------------ ----------
     35102 db44@aol.com 		  1999-12-23
     40134 db@spindelvision.com 	  1999-02-04
     16979 dbdors@ev1.net		  2000-10-03
     16979 dbdors@ev1.net		  2000-10-26
    235920 dbendo@mindspring.com	  2000-08-03
    258161 dbouchar@bell.mma.edu	  2000-10-26
     39921 dbp@agora.rdrop.com		  1999-06-03
     39921 dbp@agora.rdrop.com		  1999-11-05

8 rows selected.

Hey! This completely wrecked our outer join! All of the rows where the user had not posted any ads have now disappeared. Why? They didn't meet the and classified_ads.posted > '1999-01-01' constraint. The outer join added NULLs to every column in the report where there was no corresponding row in the classified_ads table. The new constraint is comparing NULL to January 1, 1999 and the answer is... NULL. That's three-valued logic for you. Any computation involving a NULL turns out NULL. Each WHERE clause constraint must evaluate to true for a row to be kept in the result set of the SELECT. What's the solution? A "view on the fly". Instead of OUTER JOINing the users table to the classified_ads, we will OUTER JOIN users to a view of classified_ads that contains only those ads posted since January 1, 1999:


select users.user_id, users.email, ad_view.posted
from 
  users, 
  (select * 
   from classified_ads
   where posted > '1999-01-01') ad_view
where users.user_id = ad_view.user_id(+)
and users.email like 'db%'
order by users.email, ad_view.posted;

   USER_ID EMAIL			  POSTED
---------- ------------------------------ ----------
     71668 db-designs@emeraldnet.net
    112295 db1@sisna.com
    137640 db25@umail.umd.edu
     35102 db44@aol.com 		  1999-12-23
     59279 db4rs@aol.com
     95190 db@astro.com.au
     17474 db@hotmail.com
    248220 db@indianhospitality.com
     40134 db@spindelvision.com 	  1999-02-04
    144420 db_chang@yahoo.com
     15020 dbaaru@mindspring.com
...
174 rows selected.

Note that we've named our "view on the fly" ad_view for the duration of this query.

How Views Work

Programmers aren't supposed to have to think about how views work. However, it is worth noting that the RDBMS merely stores the view definition and not any of the data in a view. Querying against a view versus the underlying tables does not change the way that data are retrieved or cached. Standard RDBMS views exist to make programming more convenient or to address security concerns, not to make data access more efficient.

How Materialized Views Work

Starting with Oracle 8.1.5, introduced in March 1999, you can have a materialized view, also known as a summary. Like a regular view, a materialized view can be used to build a black-box abstraction for the programmer. In other words, the view might be created with a complicated JOIN, or an expensive GROUP BY with sums and averages. With a regular view, this expensive operation would be done every time you issued a query. With a materialized view, the expensive operation is done when the view is created and thus an individual query need not involve substantial computation.

Materialized views consume space because Oracle is keeping a copy of the data or at least a copy of information derivable from the data. More importantly, a materialized view does not contain up-to-the-minute information. When you query a regular view, your results includes changes made up to the last committed transaction before your SELECT. When you query a materialized view, you're getting results as of the time that the view was created or refreshed. Note that Oracle lets you specify a refresh interval at which the materialized view will automatically be refreshed.

At this point, you'd expect an experienced Oracle user to say "Hey, these aren't new. This is the old CREATE SNAPSHOT facility that we used to keep semi-up-to-date copies of tables on machines across the network!" What is new with materialized views is that you can create them with the ENABLE QUERY REWRITE option. This authorizes the SQL parser to look at a query involving aggregates or JOINs and go to the materialized view instead. Consider the following query, from the ArsDigita Community System's /admin/users/registration-history.tcl page:


select 
  to_char(registration_date,'YYYYMM') as sort_key,
  rtrim(to_char(registration_date,'Month')) as pretty_month, 
  to_char(registration_date,'YYYY') as pretty_year, 
  count(*) as n_new
from users
group by 
  to_char(registration_date,'YYYYMM'), 
  to_char(registration_date,'Month'), 
  to_char(registration_date,'YYYY')
order by 1;


SORT_K PRETTY_MO PRET	   N_NEW
------ --------- ---- ----------
199805 May	 1998	     898
199806 June	 1998	     806
199807 July	 1998	     972
199808 August	 1998	     849
199809 September 1998	    1023
199810 October	 1998	    1089
199811 November  1998	    1005
199812 December  1998	    1059
199901 January	 1999	    1488
199902 February  1999	    2148

For each month, we have a count of how many users registered at photo.net. To execute the query, Oracle must sequentially scan the users table. If the users table grew large and you wanted the query to be instant, you'd sacrifice some timeliness in the stats with


create materialized view users_by_month
   enable query rewrite
   refresh complete
   start with 1999-03-28
   next sysdate + 1 
   as 
   select 
     to_char(registration_date,'YYYYMM') as sort_key,
     rtrim(to_char(registration_date,'Month')) as pretty_month, 
     to_char(registration_date,'YYYY') as pretty_year, 
     count(*) as n_new
   from users
   group by 
     to_char(registration_date,'YYYYMM'), 
     to_char(registration_date,'Month'), 
     to_char(registration_date,'YYYY')
   order by 1

Oracle will build this view just after midnight on March 28, 1999. The view will be refreshed every 24 hours after that. Because of the enable query rewrite clause, Oracle will feel free to grab data from the view even when a user's query does not mention the view. For example, given the query


select count(*) 
from users 
where rtrim(to_char(registration_date,'Month')) = 'January'
and to_char(registration_date,'YYYY') = '1999'

Oracle would ignore the users table altogether and pull information from users_by_month. This would give the same result with much less work. Suppose that the current month is March 1999, though. The query


select count(*) 
from users 
where rtrim(to_char(registration_date,'Month')) = 'March'
and to_char(registration_date,'YYYY') = '1999'

will also hit the materialized view rather than the users table and hence will miss anyone who has registered since midnight (i.e., the query rewriting will cause a different result to be returned).

High level: Oracle8 Server Concepts, View section. Oracle Application Developer's Guide, Managing Views section.
Low level: Oracle8 Server SQL Reference, Create View syntax, and Create Materialized View section

---

based on SQL for Web Nerds

Categories: beginner (Audience)