test-doc - More complex queries

I OpenACS For Everyone
- I.1 High level information: What is OpenACS?
  - I.1.1 Overview
  - I.1.2 OpenACS Release Notes
- I.2 OpenACS: robust web development framework
  - I.2.1 Introduction
  - I.2.2 Basic infrastructure
  - I.2.3 Advanced infrastructure
  - I.2.4 Domain level tools
II Administrator's Guide
- II.2 Installation Overview
  - II.2.1 Basic Steps
  - II.2.2 Prerequisite Software
- II.3 Complete Installation
  - II.3.1 Install a Unix-like system and supporting software
  - II.3.2 Install Oracle 10g XE on debian
    - II.3.2.1 Install Oracle 8.1.7
  - II.3.3 Install PostgreSQL
  - II.3.4 Install AOLserver 4
  - II.3.5 Quick Install of OpenACS
    - II.3.5.1 Complex Install OpenACS 5.3
  - II.3.6 OpenACS Installation Guide for Windows2000
  - II.3.7 OpenACS Installation Guide for Mac OS X
- II.4 Configuring a new OpenACS Site
  - II.4.1 Installing OpenACS packages
  - II.4.2 Mounting OpenACS packages
  - II.4.3 Configuring an OpenACS package
  - II.4.4 Setting Permissions on an OpenACS package
  - II.4.5 How Do I?
  - II.4.6 Configure OpenACS look and feel with templates
- II.5 Upgrading
  - II.5.1 Overview
  - II.5.2 Upgrading 4.5 or higher to 4.6.3
  - II.5.3 Upgrading OpenACS 4.6.3 to 5.0
  - II.5.4 Upgrading an OpenACS 5.0.0 or greater installation
  - II.5.5 Upgrading the OpenACS files
  - II.5.6 Upgrading Platform components
- II.6 Production Environments
  - II.6.1 Starting and Stopping an OpenACS instance.
  - II.6.2 AOLserver keepalive with inittab
  - II.6.3 Running multiple services on one machine
  - II.6.4 High Availability/High Performance Configurations
  - II.6.5 Staged Deployment for Production Networks
  - II.6.6 Installing SSL Support for an OpenACS service
  - II.6.7 Set up Log Analysis Reports
  - II.6.8 External uptime validation
  - II.6.9 Diagnosing Performance Problems
- II.7 Database Management
  - II.7.1 Running a PostgreSQL database on another server
  - II.7.2 Deleting a tablespace
  - II.7.3 Vacuum Postgres nightly
- II.8 Backup and Recovery
  - II.8.1 Backup Strategy
  - II.8.2 Manual backup and recovery
  - II.8.3 Automated Backup
  - II.8.4 Using CVS for backup-recovery
- II.A Install Red Hat 8/9
- II.B Install additional supporting software
  - II.B.1 Unpack the OpenACS tarball
  - II.B.2 Initialize CVS (OPTIONAL)
  - II.B.3 Add PSGML commands to emacs init file (OPTIONAL)
  - II.B.4 Install Daemontools (OPTIONAL)
  - II.B.5 Install qmail (OPTIONAL)
  - II.B.6 Install Analog web file analyzer
  - II.B.7 Install nspam
  - II.B.8 Install Full Text Search
  - II.B.9 Install Full Text Search using Tsearch2
  - II.B.10 Install Full Text Search using OpenFTS (deprecated see tsearch2)
  - II.B.11 Install nsopenssl
  - II.B.12 Install tclwebtest.
  - II.B.13 Install PHP for use in AOLserver
  - II.B.14 Install Squirrelmail for use as a webmail system for OpenACS
  - II.B.15 Install PAM Radius for use as external authentication
  - II.B.16 Install LDAP for use as external authentication
  - II.B.17 Install AOLserver 3.3oacs1
- II.C Credits
  - II.C.1 Where did this document come from?
  - II.C.2 Linux Install Guides
  - II.C.3 Security Information
  - II.C.4 Resources
III For OpenACS Package Developers
- III.9 Development Tutorial
  - III.9.1 Creating an Application Package
  - III.9.2 Setting Up Database Objects
  - III.9.3 Creating Web Pages
  - III.9.4 Debugging and Automated Testing
- III.10 Advanced Topics
  - III.10.1 Write the Requirements and Design Specs
  - III.10.2 Add the new package to CVS
  - III.10.3 OpenACS Edit This Page Templates
  - III.10.4 Adding Comments
  - III.10.5 Admin Pages
  - III.10.6 Categories
  - III.10.7 Profile your code
  - III.10.8 Prepare the package for distribution.
  - III.10.9 Distributing upgrades of your package
  - III.10.10 Notifications
  - III.10.11 Hierarchical data
  - III.10.12 Using .vuh files for pretty urls
  - III.10.13 Laying out a page with CSS instead of tables
  - III.10.14 Sending HTML email from your application
  - III.10.15 Basic Caching
  - III.10.16 Scheduled Procedures
  - III.10.17 Enabling WYSIWYG
  - III.10.18 Adding in parameters for your package
  - III.10.19 Writing upgrade scripts
  - III.10.20 Connect to a second database
  - III.10.21 Future Topics
- III.11 Development Reference
  - III.11.1 OpenACS Packages
  - III.11.2 OpenACS Data Models and the Object System
  - III.11.3 The Request Processor
  - III.11.4 The OpenACS Database Access API
  - III.11.5 Using Templates in OpenACS
  - III.11.6 Groups, Context, Permissions
  - III.11.7 Writing OpenACS Application Pages
  - III.11.8 Parties in OpenACS
  - III.11.9 OpenACS Permissions Tediously Explained
  - III.11.10 Object Identity
  - III.11.11 Programming with AOLserver
  - III.11.12 Using Form Builder: building html forms dynamically
- III.12 Engineering Standards
  - III.12.1 OpenACS Style Guide
  - III.12.2 Release Version Numbering
  - III.12.3 Constraint naming standard
  - III.12.4 ACS File Naming and Formatting Standards
  - III.12.5 PL/SQL Standards
  - III.12.6 Variables
  - III.12.7 Automated Testing
- III.13 CVS Guidelines
  - III.13.1 Using CVS with OpenACS
  - III.13.2 OpenACS CVS Concepts
  - III.13.3 Contributing code back to OpenACS
  - III.13.4 Additional Resources for CVS
- III.14 Documentation Standards
  - III.14.1 OpenACS Documentation Guide
  - III.14.2 Using PSGML mode in Emacs
  - III.14.3 Using nXML mode in Emacs
  - III.14.4 Detailed Design Documentation Template
  - III.14.5 System/Application Requirements Template
- III.15 TCLWebtest
  - III.15.1 API test
  - III.15.2 Webtest
- III.16 Internationalization
  - III.16.1 Internationalization and Localization Overview
  - III.16.2 How Internationalization/Localization works in OpenACS
  - III.16.4 Design Notes
  - III.16.5 Translator's Guide
- III.D Using CVS with an OpenACS Site
IV For OpenACS Platform Developers
- IV.17 Kernel Documentation
  - IV.17.1 Overview
  - IV.17.2 Object Model Requirements
  - IV.17.3 Object Model Design
  - IV.17.4 Permissions Requirements
  - IV.17.5 Permissions Design
  - IV.17.6 Groups Requirements
  - IV.17.7 Groups Design
  - IV.17.8 Subsites Requirements
  - IV.17.9 Subsites Design Document
  - IV.17.10 Package Manager Requirements
  - IV.17.11 Package Manager Design
  - IV.17.12 Database Access API
  - IV.17.13 OpenACS Internationalization Requirements
  - IV.17.14 Security Requirements
  - IV.17.15 Security Design
  - IV.17.16 Security Notes
  - IV.17.17 Request Processor Requirements
  - IV.17.18 Request Processor Design
  - IV.17.19 Documenting Tcl Files: Page Contracts and Libraries
  - IV.17.20 Bootstrapping OpenACS
  - IV.17.21 External Authentication Requirements
- IV.18 Releasing OpenACS
  - IV.18.1 OpenACS Core and .LRN
  - IV.18.2 How to Update the OpenACS.org repository
  - IV.18.3 How to package and release an OpenACS Package
  - IV.18.4 How to Update the translations
V Tcl for Web Nerds
- V.1 Tcl for Web Nerds Introduction
- V.2 Basic String Operations
- V.3 List Operations
- V.4 Pattern matching
- V.5 Array Operations
- V.6 Numbers
- V.7 Control Structure
- V.8 Scope, Upvar and Uplevel
- V.9 File Operations
- V.10 Eval
- V.11 Exec
- V.12 Tcl for Web Use
- V.13 OpenACS conventions for TCL
- V.14 Solutions
VI SQL for Web Nerds
- VI.1 SQL Tutorial
  - VI.1.1 SQL Tutorial
  - VI.1.2 Answers
- VI.2 SQL for Web Nerds Introduction
- VI.3 Data modeling
  - VI.3.1 The Discussion Forum -- philg's personal odyssey
  - VI.3.2 Data Types (Oracle)
  - VI.3.4 Tables
  - VI.3.5 Constraints
- VI.4 Simple queries
- VI.5 More complex queries
- VI.6 Transactions
- VI.7 Triggers
- VI.8 Views
- VI.9 Style
- VI.10 Escaping to the procedural world
- VI.11 Trees

97.14%

· Index

VI.5 More complex queries

Suppose that you want to start lumping together information from multiple rows. For example, you're interested in JOINing users with their classified ads. That will give you one row per ad posted. But you want to mush all the rows together for a particular user and just look at the most recent posting time. What you need is the GROUP BY construct:


select users.user_id, users.email, max(classified_ads.posted)
from users, classified_ads
where users.user_id = classified_ads.user_id
group by users.user_id, users.email
order by upper(users.email);

   USER_ID EMAIL			       MAX(CLASSI
---------- ----------------------------------- ----------
     39406 102140.1200@compuserve.com	       1998-10-08
     39842 102144.2651@compuserve.com	       1998-12-13
     41426 50@seattle.va.gov		       1997-01-13
     37428 71730.345@compuserve.com	       1998-11-24
     35970 aaibrahim@earthlink.net	       1998-11-08
     36671 absolutsci@aol.com		       1998-10-06
     35781 alevy@agtnet.com		       1997-07-14
     40111 alexzorba@aol.com		       1998-09-25
     39060 amchiu@worldnet.att.net	       1998-12-11
     35989 andrew.c.beckner@bankamerica.com    1998-08-13
     33923 andy_roo@mit.edu		       1998-12-10

The group by users.user_id, users.email tells SQL to "lump together all the rows that have the same values in these two columns." In addition to the grouped by columns, we can run aggregate functions on the columns that aren't being grouped. For example, the MAX above applies to the posting dates for the rows in a particular group. We can also use COUNT to see at a glance how active and how recently active a user has been:


select users.user_id, users.email, count(*), max(classified_ads.posted)
from users, classified_ads
where users.user_id = classified_ads.user_id
group by users.user_id, users.email
order by upper(users.email);

   USER_ID EMAIL				 COUNT(*) MAX(CLASSI
---------- ----------------------------------- ---------- ----------
     39406 102140.1200@compuserve.com			3 1998-10-08
     39842 102144.2651@compuserve.com			3 1998-12-13
     41426 50@seattle.va.gov				1 1997-01-13
     37428 71730.345@compuserve.com			3 1998-11-24
     35970 aaibrahim@earthlink.net			1 1998-11-08
     36671 absolutsci@aol.com				2 1998-10-06
     35781 alevy@agtnet.com				1 1997-07-14
     40111 alexzorba@aol.com				1 1998-09-25
     39060 amchiu@worldnet.att.net			1 1998-12-11
     35989 andrew.c.beckner@bankamerica.com		1 1998-08-13
     33923 andy_roo@mit.edu				1 1998-12-10

A publisher who was truly curious about this stuff probably wouldn't be interested in these results alphabetically. Let's find our most recently active users. At the same time, let's get rid of the unsightly "MAX(CLASSI" at the top of the report:


select users.user_id, 
       users.email, 
       count(*) as how_many,
       max(classified_ads.posted) as how_recent
from users, classified_ads
where users.user_id = classified_ads.user_id
group by users.user_id, users.email
order by how_recent desc, how_many desc;

   USER_ID EMAIL				 HOW_MANY HOW_RECENT
---------- ----------------------------------- ---------- ----------
     39842 102144.2651@compuserve.com			3 1998-12-13
     39968 mkravit@mindspring.com			1 1998-12-13
     36758 mccallister@mindspring.com			1 1998-12-13
     38513 franjeff@alltel.net				1 1998-12-13
     34530 nverdesoto@earthlink.net			3 1998-12-13
     34765 jrl@blast.princeton.edu			1 1998-12-13
     38497 jeetsukumaran@pd.jaring.my			1 1998-12-12
     38879 john.macpherson@btinternet.com		5 1998-12-12
     37808 eck@coastalnet.com				1 1998-12-12
     37482 dougcan@arn.net				1 1998-12-12

Note that we were able to use our correlation names of "how_recent" and "how_many" in the ORDER BY clause. The desc ("descending") directives in the ORDER BY clause instruct Oracle to put the largest values at the top. The default sort order is from smallest to largest ("ascending").

Upon close inspection, the results are confusing. We instructed Oracle to rank first by date and second by number of postings. Yet for 1998-12-13 we don't see both users with three total postings at the top. That's because Oracle dates are precise to the second even when the hour, minute, and second details are not displayed by SQL*Plus. A better query would include the clause


order by trunc(how_recent) desc, how_many desc

where the built-in Oracle function trunc truncates each date to midnight on the day in question.

Finding co-moderators: The HAVING Clause

The WHERE clause restricts which rows are returned. The HAVING clause operates analogously but on groups of rows. Suppose, for example, that we're interested in finding those users who've contributed heavily to our discussion forum:


select user_id, count(*) as how_many
from bboard
group by user_id
order by how_many desc;

   USER_ID   HOW_MANY
---------- ----------
     34474	 1922
     35164	  985
     41112	  855
     37021	  834
     34004	  823
     37397	  717
     40375	  639
...
     33963	    1
     33941	    1
     33918	    1

7348 rows selected.

Seventy three hundred rows. That's way too big considering that we are only interested in nominating a couple of people. Let's restrict to more recent activity. A posting contributed three years ago is not necessarily evidence of interest in the community right now.


select user_id, count(*) as how_many
from bboard
where posting_time + 60 > sysdate
group by user_id
order by how_many desc;

   USER_ID   HOW_MANY
---------- ----------
     34375	   80
     34004	   79
     37903	   49
     41074	   46
...

1120 rows selected.

We wanted to kill rows, not groups, so we did it with a WHERE clause. Let's get rid of the people who are already serving as maintainers.


select user_id, count(*) as how_many
from bboard
where not exists (select 1 from 
                  bboard_authorized_maintainers bam
                  where bam.user_id = bboard.user_id)
and posting_time + 60 > sysdate
group by user_id
order by how_many desc;

The concept of User ID makes sense for both rows and groups, so we can restrict our results either with the extra WHERE clause above or by letting the relational database management system produce the groups and then we'll ask that they be tossed out using a HAVING clause:


select user_id, count(*) as how_many
from bboard
where posting_time + 60 > sysdate
group by user_id
having not exists (select 1 from 
                  bboard_authorized_maintainers bam
                  where bam.user_id = bboard.user_id)
order by how_many desc;

This doesn't get to the root cause of our distressingly large query result: we don't want to see groups where how_many is less than 30. Here's the SQL for "show me users who've posted at least 30 messages in the past 60 days, ranked in descending order of volubility":


select user_id, count(*) as how_many
from bboard
where posting_time + 60 > sysdate
group by user_id
having count(*) >= 30
order by how_many desc;

   USER_ID   HOW_MANY
---------- ----------
     34375	   80
     34004	   79
     37903	   49
     41074	   46
     42485	   46
     35387	   30
     42453	   30

7 rows selected.

We had to do this in a HAVING clause because the number of rows in a group is a concept that doesn't make sense at the per-row level on which WHERE clauses operate.

Oracle 8's SQL parser is too feeble to allow you to use the how_many correlation variable in the HAVING clause. You therefore have to repeat the count(*) incantation.

Set Operations: UNION, INTERSECT, and MINUS

Oracle provides set operations that can be used to combine rows produced by two or more separate SELECT statements. UNION pools together the rows returned by two queries, removing any duplicate rows. INTERSECT combines the result sets of two queries by removing any rows that are not present in both. MINUS combines the results of two queries by taking the the first result set and subtracting from it any rows that are also found in the second. Of the three, UNION is the most useful in practice.

In the ArsDigita Community System ticket tracker, people reporting a bug or requesting a feature are given a menu of potential deadlines. For some projects, common project deadlines are stored in the ticket_deadlines table. These should appear in an HTML SELECT form element. We also, however, want some options like "today", "tomorrow", "next week", and "next month". The easiest way to handle these is to query the dual table to perform some date arithmetic. Each of these queries will return one row and if we UNION four of them together with the query from ticket_deadlines, we can have the basis for a very simple Web script to present the options:


select 
  'today - ' || to_char(trunc(sysdate),'Mon FMDDFM'),
  trunc(sysdate) as deadline
from dual 
UNION
select 
  'tomorrow - '|| to_char(trunc(sysdate+1),'Mon FMDDFM'), 
  trunc(sysdate+1) as deadline
from dual 
UNION
select
  'next week - '|| to_char(trunc(sysdate+7),'Mon FMDDFM'), 
  trunc(sysdate+7) as deadline
from dual 
UNION
select 
  'next month - '|| to_char(trunc(ADD_MONTHS(sysdate,1)),'Mon FMDDFM'), 
  trunc(ADD_MONTHS(sysdate,1)) as deadline 
from dual
UNION
select 
  name || ' - ' || to_char(deadline, 'Mon FMDDFM'), 
  deadline
from ticket_deadlines 
where project_id = :project_id
and deadline >= trunc(sysdate)
order by deadline

will produce something like

<form>

<select name="deadline_choice">

<option value="2000-10-28">today - Oct 28

</option><option value="2000-10-29">tomorrow - Oct 29

</option><option value="2000-11-04">next week - Nov 4

</option><option value="2000-11-28">next month - Nov 28

</option><option value="2000-12-01">V2.0 freeze - Dec 1

</option><option value="2000-12-15">V2.0 ship - Dec 15

</option></select>

</form>

The INTERSECT and MINUS operators are seldom used. Here are some contrived examples. Suppose that you collect contest entries by Web users, each in a separate table:


create table trip_to_paris_contest (
	user_id		references users,
	entry_date	date not null
);

create table camera_giveaway_contest (
	user_id		references users,
	entry_date	date not null
);

Now let's populate with some dummy data:


-- all three users love to go to Paris
insert into trip_to_paris_contest values (1,'2000-10-20');
insert into trip_to_paris_contest values (2,'2000-10-22');
insert into trip_to_paris_contest values (3,'2000-10-23');

-- only User #2 is a camera nerd
insert into camera_giveaway_contest values (2,'2000-05-01');

Suppose that we've got a new contest on the site. This time we're giving away a trip to Churchill, Manitoba to photograph polar bears. We assume that the most interested users will be those who've entered both the travel and the camera contests. Let's get their user IDs so that we can notify them via email (spam) about the new contest:


select user_id from trip_to_paris_contest
intersect
select user_id from camera_giveaway_contest;

   USER_ID
----------
	 2

Or suppose that we're going to organize a personal trip to Paris and want to find someone to share the cost of a room at the Crillon. We can assume that anyone who entered the Paris trip contest is interested in going. So perhaps we should start by sending them all email. On the other hand, how can one enjoy a quiet evening with the absinthe bottle if one's companion is constantly blasting away with an electronic flash? We're interested in people who entered the Paris trip contest but who did not enter the camera giveaway:


select user_id from trip_to_paris_contest
minus
select user_id from camera_giveaway_contest;

   USER_ID
----------
	 1
	 3

---

based on SQL for Web Nerds

Categories: beginner (Audience)