docbook-primer.xml
Delivered as text/xml
[ hide source ] | [ make this the default ]
File Contents
<?xml version='1.0' ?>
<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
<!ENTITY % myvars SYSTEM "../variables.ent">
%myvars;
]>
<sect1 id="docbook-primer">
<title>OpenACS Documentation Guide</title>
<para>
By Claus Rasmussen, with additions by Roberto Mello, Vinod Kurup, and the OpenACS community
</para>
<sect2 id="dbprimer-overview" xreflabel="OpenACS Documentation Overview">
<title>Overview of OpenACS Documentation</title>
<para>
<productname>OpenACS</productname> is a powerful system with
incredible possibilities and applications, but
this power comes with some complexity and a steep learning curve
that is only attenuated by good documentation. Our goal is to write
superb documentation, so that users, developers and administrators
of OpenACS installations can enjoy the system.
</para>
<para>
The history of OpenACS documentation: ..began by
building on a good documentation base from ArsDigita's ACS in the
late 1990's. Some sections of the documentation, however, lacked
details and examples; others simply did not exist. The OpenACS
community began meeting the challenge by identifying needs and
writing documentation on an as needed basis.
</para>
<para>
By having documentation dependent on volunteers and code
developers, documentation updates lagged behind the evolving
system software. As significant development changes were made
to the system, existing documentation became dated, and its
value significantly reduced. The valiant efforts that were made
to keep the documentation current proved too difficult as
changes to the system sometimes had far-reaching affects to
pages throughout the documentation. System integration and
optimization quickly rendered documentation obsolete for
developers. The code became the substitute and source for
documentation.
</para>
<para>
With thousands of lines of code and few developers tracking
changes, features and advances to the OpenACS system went
unnoticed or were not well understood except by the code
authors. Work was duplicated as a consequence of developers
not realizing the significant work completed by others. New
developers had to learn the system through experience with
working with it and discussion in the forums. Informal sharing
of experiential and tacit knowledge has become the OpenACS
community's main method of sharing knowledge.
</para>
<para>
This document attempts to shape ongoing documentation efforts by
using principles of continual improvement to re-engineer
documentation production.
</para>
</sect2>
<sect2 id="docs-managing" xreflabel="Managing OpenACS Documentation">
<title>Managing OpenACS Documentation</title>
<para>
Documentation production shares many of the challenges of
software development, such as managing contributions, revisions
and the (editorial) release cycle. This is yet another
experiment in improving documentation --this time by using
principles of continual improvement to focus the on-going
efforts. These processes are outlined as project management
phases:
</para>
<orderedlist>
<listitem><para>
<emphasis role="strong">Requirements phase</emphasis> is about setting goals and
specifications, and includes exploration of scenarios, use cases
etc. As an example, see the <ulink
url="http://openacs.org/doc/openacs-4/requirements-template.html">
OpenACS Documentation Requirements Template</ulink> which focuses on
systems requirements for developers.
</para></listitem>
<listitem><para>
<emphasis role="strong">Strategy phase</emphasis> is about creating an approach
to doing work. It sets behavioral guidelines and boundaries
that help keep perspective on how efforts are directed.
OpenACS developers discuss strategy when coordinating
efforts such as code revisioning and new features.
</para></listitem>
<listitem><para>
<emphasis role="strong">Planning phase</emphasis> is about explicitly stating
the way to implement the strategy as a set of methods.
OpenACS system design requires planning. For example, see
<ulink
url="http://openacs.org/doc/openacs-4/filename.html">OpenACS
documentation template</ulink> planning relating to package
design.
</para></listitem>
<listitem><para>
<emphasis role="strong">Implementation phase</emphasis> is about performing the
work according to the plan, where decisions on how to handle
unforeseen circumstances are guided by the strategy and
requirements.
</para></listitem>
<listitem><para>
<emphasis role="strong">Verification phase</emphasis> measures how well the plan
was implemented. Success is measured by A) verifying if the
project has met the established goals, and B) reviewing for
ongoing problem areas etc. OpenACS follows verification
through different means on different projects, but in all
cases, the OpenACS community verifies the project as a
success through feedback including bug reports, user and
administrator comments, and code changes.
</para></listitem>
</orderedlist>
<para>
OpenACS forum discussions on documentation requirements and strategies are
summarized in the following sections. Production
phases are mainly organized and fulfilled by a designated documentation maintainer.
Hopefully the following sections will help spur greater
direct participation by the OpenACS community.
</para>
</sect2>
<sect2 id="docs-requirements" xreflabel="OpenACS General Documentation Requirements">
<title>OpenACS General Documentation Requirements</title>
<para>
By the OpenACS community. This section is a collection of
documentation requirements that have been expressed in the
OpenACS forums to 4th July 2003.
</para>
<para>
OpenACS documentation should meet the following requirements. No
significance has been given to the order presented, topic breadth or depth here.
</para>
<itemizedlist>
<listitem><para>
clarity in presentation. <ulink
url="http://www.lifewithqmail.org/lwq.html">Life with
qmail</ulink> is a recommended example of "rated high" online
documentation.
</para></listitem>
<listitem><para>
Avoid requirements that significantly increase the labor
required to maintain documentation.
</para></listitem>
<listitem>
<para>
Use best practices learned from the print world, web, and
other media, about use of gamma, space, writing style etc.
</para>
<itemizedlist>
<listitem><para>
Consistency in publishing -Establishing and adhering to publishing standards
</para></listitem>
<listitem><para>
Use standardized language -Use international English
(without slang or colloquial terms) for ESL (English as
a second language) readers (and making translation
easier for those interested in translating the
documentation for internationalization efforts).
</para></listitem>
<listitem><para>
All jargon used in documentation needs to be defined.
Use standardized terms when available, avoiding implicit
understanding of specific OpenACS terms.
</para></listitem>
<listitem><para>
Document titles (for example on html pages) should
include whole document title (as in book title):
(chapter title) : (section), so that bookmarks etc.
indicate location in a manner similar to pages in books
(in print publishing world).
</para></listitem>
<listitem><para>
Organize document according to the needs of the reader
(which may be different than the wishes of the writers).
</para></listitem>
<listitem><para>
Do not make informal exclamations about difficulty/ease
for users to complete tasks or understand... for
example, "Simply...". Readers come from many different
backgrounds --remember that the greater audience is
likely as varied as the readers on the internet--- If
important, state pre-conditions or knowledge
requirements etc. if different than the rest of the
context of the document. For example, "requires basic
competency with a text-based editor such as vi or emacs
via telnet"
</para></listitem>
</itemizedlist>
</listitem>
<listitem><para>
Show where to find current information instead of writing
about current info that becomes obsolete. If the information
is not found elsewhere, then create one place for it, where
others can refer to it. This structure of information will
significantly reduce obsolescence in writing and labor burden
to maintain up-to-date documentation. In other words, state
facts in appropriately focused, designated areas only, then
refer to them by reference (with links).
</para>
<para>
Note: Sometimes facts should be stated multiple ways, to
accommodate different reading style preferences. The should
still be in 1 area, using a common layout of perhaps
summary, introduction and discussion requiring increasing
expertise, complexity or specificity.
</para></listitem>
<listitem><para>
Consistency in link descriptions -When link URLs refer to
whole documents, make the link (anchor wrapped title) that
points to a document with the same title and/or heading of
the document.
</para></listitem>
<listitem><para>
Consider OpenACS documentation as a set of books (an
encyclopedic set organized like an atlas) that contains
volumes (books). Each book contains chapters and sections
much like how DocBook examples are shown, where each chapter
is a web page. This designation could help create an OpenACs
book in print, and help new readers visualize how the
documentation is organized.
</para></listitem>
<listitem><para>
The use licenses between OpenACS and Arsdigita's ACS are not
compatible, thereby creating strict limits on how much
OpenACS developers should have access to Arsdigita code and
resources. The OpenACS documentation has a new legal
requirement: to eliminate any dependency on learning about
the system from Arsdigita ACS examples to minimize any
inference of license noncompliance, while recognizing the
important work accomplished by Philip Greenspun, Arsdigita,
and the early ACS adopters.
</para></listitem>
<listitem><para>
Use a consistent general outline for each book.
<itemizedlist>
<listitem><para>
Introduction (includes purpose/goal), Glossary of terms,
Credits, License, Copyright, Revision History
</para></listitem>
<listitem><para>
Table of Contents (TOC)s for each book: the end-users, content and site
administrators, marketing, developer tutorial, and
developers.
</para></listitem>
<listitem><para>
Priorities of order and content vary based on each of
the different readers mentioned. The developers guide
should be organized to be most useful to the priorities
of developers, while being consistent with the general
documentation requirements including publishing strategy,
style etc.
</para></listitem>
<listitem><para>
Use generic DocBook syntax to maximize reader familiarity with the documents.
<programlisting>
<book><title><part label="Part 1"><etc...>
</programlisting>
</para></listitem>
</itemizedlist>
</para></listitem>
</itemizedlist>
</sect2>
<sect2 id="docs-end-user-reqs" xreflabel="End-user Documentation Requirements">
<title>OpenACS Documentation Requirements for End-users</title>
<para>
By the OpenACS community. This section is a collection of
documentation requirements that have been expressed in the
OpenACS forums to 4th July 2003.
</para>
<para>
OpenACS end-user documentation should meet the following requirements. No
significance has been given to the order presented, topic breadth or depth here.
</para>
<itemizedlist>
<listitem>
<para>
End-users should not have to read docs to use the system.
</para>
</listitem>
<listitem><para>
Include how to get help. How and where to find answers,
contact others, what to do if one gets an AOLserver or other
error when using the system. Include types of available
support (open-source, private commercial etc.) including
references.
</para></listitem>
<listitem><para>
Explain/foster understanding of the overall structure of the
system. This would be an overview of the system components,
how it works, and how to find out more or dig deeper... To
promote the system by presenting the history of the system,
and writing about some tacit knowledge re: OpenACS.org and
the opensource culture.
</para></listitem>
<listitem><para>
Introduce and inspire readers about the uses, benefits, and
the possibilities this system brings (think customer
solution, customer cost, convenience, value). A
comprehensive community communications system; How this
system is valuable to users; Reasons others use OpenACS
(with quotes in their own words) "...the most important
thing that the ACS does is manage users, i.e. provide a way
to group, view and manipulate members of the web community.
-- Talli Somekh, September 19, 2001" using it to
communicate, cooperate, collaborate... OpenACS offers
directed content functionality with the OpenACS templating
system. ... OpenACS is more than a data collection and
presentation tool. OpenACS has management facilities that
are absent in other portals. ...The beauty of OpenACS is
the simplicity (and scalability) of the platform on which it
is built and the library of tried and tested community
building tools that are waiting to be added. It seems that
most portals just add another layer of complexity to the
cake. See <ulink
url="http://openacs.org/bboard/q-and-a-fetch-msg.tcl?msg_id=00058H&topic_id=11&topic=OpenACS">Slides on OACS
features</ulink>...a set of slides on OACS features that can
be used for beginners who want to know OACS is about and
what they can do with it. Screen captures that highlight
features. Example shows BBoard, calendar, news, file
storage, wimpy point, ticket tracking. An OpenACS tour; an
abbreviated, interactive set of demo pages.
</para></listitem>
<listitem><para>
From a marketing perspective,
<itemizedlist>
<listitem><para>
differentiate "product" by highlighting features,
performance quality, conformance to standards,
durability (handling of technological obsolescence),
reliability, repairability, style of use, design
(strategy in design, specifications, integrated,
well-matched systems etc).
</para></listitem>
<listitem><para>
differentiate "service" by highlighting software
availability (licensing and completeness from mature too early
adopters or development versions), community incident
support, project collaborative opportunities, and
contractor support availability
</para></listitem>
<listitem><para>
differentiate price (economic considerations of
opensource and features)
</para></listitem>
<listitem><para>
Discussion and details should rely on meeting criteria
of design, completeness of implementation, and related
system strengths and weaknesses. Marketing should not
rely on comparing to other technologies. Competitive
analysis involves mapping out strengths, weaknesses,
opportunities and threats when compared to other systems
for a specific purpose, and thus is inappropriate (and
becomes stale quickly) for general documentation.
</para></listitem>
<listitem><para>
When identifying subsystems, such as tcl, include links
to their marketing material if available.
</para></listitem>
<listitem><para>
create an example/template comparison table that shows
versions of OpenACS and other systems (commonly
competing against OpenACS) versus a summary feature list
and how well each meets the feature criteria. Each
system should be marked with a date to indicate time
information was gathered, since information is likely
volatile.
</para></listitem>
</itemizedlist>
</para></listitem>
<listitem><para>
To build awareness about OpenACS, consider product
differentiation: form, features, performance quality,
conformance quality (to standards and requirements),
durability, reliability, repairability, style, design: the
deliberate planning of these product attributes.
</para></listitem>
<listitem><para>
Include jargon definitions, glossary, FAQs, site map/index,
including where to find Instructions for using the packages.
FAQ should refer like answers to the same place for
consistency, brevity and maintainability.
</para></listitem>
<listitem><para>
Explain/tutorial on how the UI works (links do more than go
to places, they are active), Page flow, descriptions of form
elements; browser/interface strengths and limitations (cookies, other)
</para></listitem>
<listitem><para>
Discuss criteria used to decide which features are
important, and the quality of the implementation from a
users perspective. Each project implementation places a
different emphasis on the various criteria, which is why
providing a framework to help decide is probably more useful
than an actual comparison.
</para></listitem>
</itemizedlist>
<para>
Package documentation requirements have additional requirements.
</para>
<itemizedlist>
<listitem><para>
A list of all packages, their names, their purposes, what
they can and cannot do (strengths, limitations), what
differentiates them from similar packages, minimal
description, current version, implementation status,
author/maintainers, link(s) to more info. Current version
available at the <ulink
url="http://openacs.org/repository/5-2/">repository</ulink>.
</para></listitem>
<listitem><para>
Include dependencies/requirements, known conflicts, and
comments from the real world edited into a longer
description to quickly learn if a package is appropriate for
specific projects.
</para></listitem>
<listitem><para>
Create a long bulleted list of features. Feature list should
go deeper than high-level feature lists and look at the
quality of the implementations (from the user's perspective,
not the programmer's). Example issues an end-user may have
questions about: Ticket Tracker and Ticket Tracker Lite, why
would I want one of them vs the other? And, before I specify
to download and install it, what credit card gateways are
supported by the current e-commerce module? There are some
packages where the name is clear enough, but what are the
limitations of the standard package?
</para></listitem>
<listitem><para>
End-user docs should not be duplicative. The package
description information and almost everything about a
package for administrators and developers is already
described in the package itself through two basic
development document templates: a <ulink url="http://openacs.org/doc/current/requirements-template.html">
Requirements Template</ulink> and <ulink
url="http://openacs.org/doc/current/filename.html">Detailed
Design Document</ulink>.
</para></listitem>
</itemizedlist>
</sect2>
<sect2 id="docs-admin-reqs" xreflabel="Administrators Documentation Requirements">
<title>OpenACS Documentation Requirements for Site and Administrators</title>
<para>
By the OpenACS community. This section is a collection of
documentation requirements that have been expressed in the
OpenACS forums to 4th July 2003.
</para>
<para>
OpenACS administrators' documentation should meet the following requirements. No
significance has been given to the order presented, topic breadth or depth here.
</para>
<itemizedlist>
<listitem><para>
For each requirement below, include links to developer tutorials
and other documentation for more detail.
</para></listitem>
<listitem><para>
Describe a structural overview of a working system and how
the components work together. "The Layered Cake view" a
general network view of system; a table showing system
levels versus roles to help with understanding how the
subsystems are interconnected.
</para></listitem>
<listitem><para>
Provide a comprehensive description of typical
administrative processes for operating an OpenACS system
responsibly, including reading logs and command line views that
describe status of various active processes.
</para></listitem>
<listitem><para>
Create a list of administrative tools that are useful to
administrating OpenACS, including developer support,
schema-browser and API browser. Link to AOLserver's config
file documentation.
</para></listitem>
<listitem><para>
Resources on high-level subjects such as web services,
security guidelines
</para></listitem>
<listitem><para>
Describe typical skill sets (and perhaps mapped to
standardized job titles) for administrating an OpenACS
system (human-resources style). For a subsite
admin/moderator attributes might include trustworthy,
sociable, familiarity with the applications and
subsystems, work/group communication skills et cetera
</para></listitem>
<listitem><para>
Describe how to set up typical site moderation and
administration including parameters, permissions, "Hello
World" page
</para></listitem>
<listitem><para>
Show directory structure of a typical package, explanation
of the various file types in a package (tcl,adp,xql) and how
those relate to the previously described subsystems, when
they get refreshed etc.
</para></listitem>
<listitem><para>
Ways to build a "Hello World" page
</para></listitem>
<listitem><para>
Show examples of how the OpenACS templating system is used,
including portal sections of pages. For example, create a
customised auto-refreshing startpage using lars-blogger, a
photo gallery, and latest posts from a forum. This should
rely heavily on documentation existing elsewhere to keep
current. This would essentially be a heavily annotated list
of links.
</para></listitem>
<listitem><para>
Show ways of modifying the look and feel across pages of an
OpenACS website. Refer to the skins package tutorial.
</para></listitem>
<listitem><para>
Describe a methodology for diagnosing problems, finding
error statements and interpreting them --for OpenACS and the
underlying processes.
</para></listitem>
<listitem><para>
FAQs: Administration tasks commonly discussed on boards:
admin page flow, how to change the looks of a subsite with a
new master.adp, options on "user pages" , a quick
introduction to the functions and processes. info about the
user variables, file locations
</para></listitem>
</itemizedlist>
</sect2>
<sect2 id="docs-install-reqs" xreflabel="Installation Documentation Requirements">
<title>OpenACS Installation Documentation Requirements</title>
<para>
By the OpenACS community. This section is a collection of
documentation requirements that have been expressed in the
OpenACS forums to 4th July 2003.
</para>
<para>
OpenACS installation documentation should meet the following requirements. No
significance has been given to the order presented, topic breadth or depth here.
</para>
<itemizedlist>
<listitem><para>
state installation prerequisites. For example: "You should
read through the installation process to familiarize
yourself with the installation process, before beginning an
installation."
</para></listitem>
<listitem><para>
list critical decisions (perhaps as questions) that need to
be made before starting: which OS, which DB, which AOLserver
version, system name, dependencies et cetera. Maybe summarize
options as tables or decision-trees. For example, "As you
proceed throughout the installation, you will be acting on
decisions that have an impact on how the remaining part of
the system is installed. Here is a list of questions you
should answer before beginning."
</para></listitem>
<listitem><para>
list pre-installation assumptions
</para></listitem>
<listitem><para>
Show chronological overview of the process of installing a
system to full working status: Install operating
system with supporting software, configure with preparations
for OpenACS, RDBMS(s) install and configure, Webserver
install and configure, OpenACS install and configure,
post-install work
</para></listitem>
</itemizedlist>
</sect2>
<sect2 id="docs-developer-tutorial-reqs" xreflabel="Developer Tutorial Documentation Requirements">
<title>OpenACS Developer Tutorial Documentation Requirements</title>
<para>
By the OpenACS community. This section is a collection of
documentation requirements that have been expressed in the
OpenACS forums to 4th July 2003.
</para>
<para>
OpenACS developer tutorial documentation should meet the following requirements. No
significance has been given to the order presented, topic breadth or depth here.
</para>
<itemizedlist>
<listitem>
<para>
list learning prerequisites to customize, fix, and improve
OACS modules, and create new ones. You are expected to have
read and understand the information [minimum requirements
similar to adept at Using OpenACS Administrating Guide]
before reading this guide.
</para>
</listitem>
<listitem>
<para>
Refer to development documentation instead of duplicating here
</para>
</listitem>
<listitem><para>
List suggestions for installing and setting up a development
environment; these can be annotated links to the
installation documentation
</para>
</listitem>
<listitem><para>
Provide working examples that highlight the various
subsystems, Tcl environment, OpenACS protocols, AOLserver
template and ns_* commands, OpenACS templating, sql queries,
db triggers, scheduling protocols, how to use the page contract,
how to get the accessing user_id etc
</para>
</listitem>
<listitem><para>
Show how to construct basic SQL queries using the db API,
</para>
</listitem>
<listitem><para>
The life of an HTTP request to a dynamic, templated page
</para>
</listitem>
<listitem><para>
General rules to follow for stability, scalability
</para>
</listitem>
<listitem><para>
Show the step by step customizing of an existing package
that meets current recommended coding styles of OpenACS
package development, by referring to developer resources.
</para>
</listitem>
<listitem><para>
Use the ArsDigita problem sets and "what Lars produced for ACS Java" as inspiration for a
PostgreSQL equivalent tutorial about developing a new
OpenACS package including discussion of the significance of
the package documentation templates
</para>
</listitem>
<listitem><para>
Include a summary of important links used by developers
</para>
</listitem>
<listitem><para>
Note any deprecated tools and methods by linking to prior
versions instead of describing them in current docs
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2 id="docs-developer-reqs" xreflabel="Developer Documentation Requirements">
<title>OpenACS Developer Documentation Requirements</title>
<para>
By the OpenACS community. This section is a collection of
documentation requirements that have been expressed in the
OpenACS forums to 4th July 2003.
</para>
<para>
OpenACS developer documentation should meet the following requirements. No
significance has been given to the order presented, topic breadth or depth here.
</para>
<itemizedlist>
<listitem><para>
list documentation assumptions, such as familiarity with
modifying OpenACS packages. All kernel docs are here etc.
</para></listitem>
<listitem><para>
This documentation should be written for ongoing use by
developers, not as a tutorial.
</para></listitem>
<listitem><para>
List of practical development and diagnostics tools and
methodologies.
</para></listitem>
<listitem><para>
List of OpenACS development resources, api-doc,
schema-browser, developer-support package etc.
</para></listitem>
<listitem><para>
Identify each OpenACS subsystem, explain why it is used
(instead of other choices). In the case of subsystems that
are developed outside of OpenACS such as tcl, include
external references to development and reference areas.
</para></listitem>
<listitem><para>
Show current engineering standards and indicate where
changes to the standards are in the works.
</para></listitem>
<listitem><para>
Sections should be dedicated to DotLRN standards as well, if
they are not available elsewhere.
</para></listitem>
<listitem><para>
Add overview diagrams showing the core parts of the
datamodel including an updated summary of Greenspun's
Chapter 4: Data Models and the Object System
</para></listitem>
<listitem><para>
package design guidelines and development process templates
including planning, core functions, testing, usability, and
creating case studies
</para></listitem>
<listitem><para>
Standard package conventions, where to see "model" code, and
guidelines (or where to find them) for:
<itemizedlist>
<listitem><para>
programming tcl/sql
</para></listitem>
<listitem><para>
using the acs-api
</para></listitem>
<listitem><para>
ad_form
</para></listitem>
<listitem><para>
coding permissions
</para></listitem>
<listitem><para>
OpenACS objects
</para></listitem>
<listitem><para>
scheduled protocols
</para></listitem>
<listitem><para>
call backs
</para></listitem>
<listitem><para>
directory structure
</para></listitem>
<listitem><para>
user interface
</para></listitem>
<listitem><para>
widgets
</para></listitem>
<listitem><para>
package_name and type_extension_table
</para></listitem>
<listitem><para>
adding optional services, including search, general
comments, attachments, notifications, workflow, CR and
the new CR Tcl API
</para></listitem>
</itemizedlist>
</para></listitem>
<listitem><para>
Document kernel coding requirements, strategy and guidelines
to help code changers make decisions that meet kernel
designers' criteria
</para></listitem>
</itemizedlist>
</sect2>
<sect2 id="doc-strategy" xreflabel="Documenting strategy">
<title>OpenACS Documentation Strategy</title>
<para>
OpenACS documentation development is subject to the
constraints of the software project development and release
methods and cycles (<xref linkend="using-cvs-with-openacs"/>).
Essentially, all phases of work may be active to accommodate
the asynchronous nature of multiple subprojects evolving by
the efforts of a global base of participants with culturally
diverse time references and scheduling idiosyncrasies.
</para>
<para>
The documentation strategy is to use project methods to
involve others by collaborating or obtaining guidance or
feedback (peer review) to distribute the workload and increase
the overall value of output for the OpenACS project.
</para>
</sect2>
<sect2 id="dbprimer-why" xreflabel="Why DocBook?">
<title>OpenACS Documentation Strategy: Why DocBook?</title>
<para>
OpenACS documentation is taking a dual approach to publishing.
Documentation that is subject to rapid change and participation by
the OpenACS community is managed through the <ulink
url="http://openacs.org/xowiki/pages/en/Documentation_Project">OpenACS
xowiki Documentation Project</ulink>
Formal documents that tend to remain static and require more
expressive publishing tools will be marked up to conform to the
<ulink url="http://docbook.org/xml/index.html">DocBook XML
DTD</ulink>. The remaining discussion is about publishing using
Docbook.
</para>
<para>
<indexterm><primary>DocBook</primary><secondary>DTD</secondary></indexterm>
is a publishing standard based on XML with
similar goals to the OpenACS Documentation project. Some specific reasons why we are using DocBook:
</para>
<itemizedlist>
<listitem><para>
It is open-source.
</para></listitem>
<listitem><para>
The DocBook community <ulink url="http://docbook.org/help">mailing lists</ulink>
</para></listitem>
<listitem><para>
A number of free and commercial
<ulink url="https://github.com/docbook/wiki/wiki/DocBookTools">tools</ulink> are available
for editing and publishing DocBook documents.
</para></listitem>
<listitem><para>
It enables us to publish in a variety of formats.
</para></listitem>
<listitem><para>
XML separates content from presentation: It relieves each
contributor of the burden of presentation, freeing each writer
to focus on content and sharing knowledge.
</para></listitem>
<listitem><para>
It is well tested technology. It has been in development
since the <ulink url="http://docbook.org/tdg/en/html/ch01.html#d0e2132">early 1990's</ulink>).
</para></listitem>
</itemizedlist>
<para>
Reasons why we are using Docbook XML instead of Docbook SGML:
</para>
<itemizedlist>
<listitem><para>
<emphasis>Consistency</emphasis> and history. We started with a collection
of DocBook XML files that ArsDigita wrote. Trying to re-write them to
conform to the SGML DTD would be unnecessary work.
</para></listitem>
<listitem><para>
<emphasis>XML does not require extra
effort</emphasis>. Writing in XML is almost identical to
SGML, with a couple extra rules. More details in the
<ulink url="http://www.tldp.org/LDP/LDP-Author-Guide/html/index.html">LDP Author Guide</ulink>.
</para></listitem>
<listitem><para>
<emphasis>The tool chain has matured</emphasis>. xsltproc and other XML
based tools have improved to the point where they are about as good as
the SGML tools. Both can output html and pdf formats.
</para></listitem>
</itemizedlist>
<para>
Albeit, the road to using DocBook has had some trials.
In 2002, Docbook still was not fully capable of representing
online books as practiced by book publishers and expected from
readers with regards to usability on the web. That meant
DocBook did not entirely meet OpenACS publishing requirements
at that time.
</para>
<para>
In 2004, Docbook released version 4.4, which complies with all
the OpenACS publishing requirements.
Producing a web friendly book hierarchy arguably remains DocBooks'
weakest point. For example, a dynamically built document
should be able to extract details of a specific reference from
a bibliographic (table) and present a footnote at the
point where referenced. DocBook 4.4 allows for this with
<computeroutput>bibliocoverage</computeroutput>,
<computeroutput>bibliorelation</computeroutput>, and
<computeroutput>bibliosource</computeroutput>. <ulink
url="http://www.docbook.org/tdg/en/html/docbook.html">DocBook:
The Definitive Guide</ulink> is a good start for learning how
to represent paper-based books online.
</para>
<para>
The following DocBook primer walks you through the basics, and should cover the
needs for 95 percent of the documentation we produce. You are welcome to explore DocBook's
<ulink url="http://docbook.org/tdg/en/html/part2.html">
list of elements</ulink> and use more exotic features in your
documents. The list is made up of SGML-elements but basically
the same elements are valid in the XML DTD <emphasis
role="strong">as long as you remember to</emphasis>:
<indexterm><primary>XML guidelines</primary></indexterm>
</para>
<itemizedlist>
<listitem>
<para>
Always close your tags with corresponding end-tags and to
<emphasis role="strong">not use other tag minimization</emphasis>
</para>
</listitem>
<listitem>
<para>
Write all elements and attributes in lowercase
</para>
</listitem>
<listitem><para>
Quote all attributes
</para></listitem>
</itemizedlist>
</sect2>
<sect2 id="dbprimer-validation" xreflabel="Docbook Tools">
<title>Tools</title>
<para>
You are going to need the following to work with the OpenACS
Docbook XML documentation:
</para>
<itemizedlist>
<listitem>
<para>
<ulink url="http://docbook.org/xml/index.html">Docbook XML
DTD</ulink> - The document type definition for XML. You can
find an RPM or DEB package or you can download a zip file from
the site linked from here.
</para>
</listitem>
<listitem>
<para>
<ulink url="http://sourceforge.net/projects/docbook/">XSL
Stylesheets</ulink> (docbook-xsl) - The stylesheets to convert
to HTML. We have been using a stylesheet based upon
NWalsh's chunk.xsl.
</para>
</listitem>
<listitem>
<para>
<computeroutput>xsltproc</computeroutput> - The processor that
will take an XML document and, given a xsl stylesheet, convert
it to HTML. It needs libxml2 and libxslt (available in RPM and
DEB formats or from <ulink
url="http://xmlsoft.org/">xmlsoft.org</ulink>.
</para>
</listitem>
<listitem>
<para>
Some editing tool. A popular one is Emacs with the psgml and nXML
modes. The <ulink
url="http://www.tldp.org/LDP/LDP-Author-Guide/html/index.html">LDP Author
Guide</ulink> and <ulink
url="https://github.com/docbook/wiki/wiki/DocBookTools">DocBook
Wiki</ulink> list some alternates.
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2 id="dbprimer-new-doc" xreflabel="Writing New Docs">
<title>Writing New Docs</title>
<para>
After you have the tools mentioned above, you need to define a
title for your document. Then start thinking about the possible
sections and subsections you will have in your document. Make
sure you coordinate with the OpenACS Gatekeepers to make sure
you are not writing something that someone else is already
writing. Also, if you desire to use the OpenACS CVS repository,
please e-mail the gatekeeper in charge of documentation.
</para>
<para>
You can look at some templates for documents (in Docbook XML) in
the <ulink
url="https://github.com/openacs/openacs-core/tree/oacs-5-9/packages/acs-core-docs/www/xml/engineering-standards">sources
for acs-core-docs</ulink>, especially the <emphasis>
Detailed Design Documentation Template</emphasis> and the
<emphasis>System/Application Requirements Template</emphasis>.
</para>
</sect2>
<sect2 id="dbprimer-structure">
<title>Document Structure</title>
<para>
The documentation for each package will make up a little "book" that is structured like this
- examples are <emphasis>emphasized</emphasis>:
<indexterm><primary>Document structure</primary></indexterm>
</para>
<programlisting>
book : <emphasis role="strong">Docs for one package</emphasis> - <emphasis>templating</emphasis>
|
+--chapter : <emphasis role="strong">One section</emphasis> - <emphasis>for developers</emphasis>
|
---------+------------------------------------------------------
|
+--sect1 : <emphasis role="strong">Single document</emphasis> - <emphasis>requirements</emphasis>
|
+--sect2 : <emphasis role="strong">Sections</emphasis> - <emphasis>functional requirements</emphasis>
|
+--sect3 : <emphasis role="strong">Subsections</emphasis> - <emphasis>Programmer's API</emphasis>
|
... : <emphasis role="strong">...</emphasis>
</programlisting>
<para>
The actual content is split up into documents that start at a
<computeroutput>sect1</computeroutput>-level. These are then tied together in a top-level document that
contains all the information above the line. This will be explained in more detail in a later document,
and we will provide a set of templates for documenting an entire package. </para>
<para>For now you can take a look at the
<ulink url="https://github.com/openacs/openacs-core/tree/oacs-5-9/packages/acs-core-docs/www/xml/engineering-standards">sources of these DocBook documents</ulink>
to get an idea of how they are tied together.
</para>
</sect2>
<sect2 id="dbprimer-sections">
<title>Headlines, Sections</title>
<para>
<indexterm><primary>Sections</primary><secondary>Headlines</secondary></indexterm>
Given that your job starts at the <computeroutput>sect1</computeroutput>-level, all your documents should open with a
<ulink url="http://docbook.org/tdg/en/html/sect1.html"><computeroutput><sect1></computeroutput></ulink>-tag and end
with the corresponding <computeroutput></sect1></computeroutput>.
</para>
<para>
<indexterm><primary>sect1</primary></indexterm>
You need to feed every <computeroutput><sect1></computeroutput> two attributes. The first attribute,
<computeroutput>id</computeroutput>, is standard and can be used with all elements. It comes in very
handy when interlinking between documents (more about this when talking about links in <xref linkend="dbprimer-links"/>).
The value of <computeroutput>id</computeroutput> has to be unique
throughout the book you're making since the <computeroutput>id</computeroutput>'s in your
<computeroutput>sect1</computeroutput>'s will turn into filenames when the book is parsed into HTML.
</para>
<para>
<indexterm><primary>xreflabel</primary></indexterm>
The other attribute is <computeroutput>xreflabel</computeroutput>. The value of this is the text that will appear
as the link when referring to this <computeroutput>sect1</computeroutput>.
</para>
<para>
Right after the opening tag you put the title of the document - this is usually the same as
<computeroutput>xreflabel</computeroutput>-attribute. E.g. the top level of the document you're
reading right now looks like this:
</para>
<programlisting>
<sect1 id="docbook-primer" xreflabel="DocBook Primer">
<title>DocBook Primer</title>
...
</sect1>
</programlisting>
<para>
<indexterm><primary>sect2</primary></indexterm>
Inside this container your document will be split up into
<ulink url="http://docbook.org/tdg/en/html/sect2.html"><computeroutput><sect2></computeroutput></ulink>'s,
each with the same requirements - <computeroutput>id</computeroutput> and <computeroutput>xreflabel</computeroutput>
attributes, and a <computeroutput><title></computeroutput>-tag inside. Actually, the <computeroutput>xreflabel</computeroutput> is never required in sections, but it makes linking to that section a lot easier.
</para>
<para>
When it comes to naming your
<computeroutput>sect2</computeroutput>'s and below, prefix them with some abbreviation of the <computeroutput>id</computeroutput> in the <computeroutput>sect1</computeroutput> such as <computeroutput>requirements-overview</computeroutput>.
</para>
</sect2>
<sect2 id="dbprimer-code">
<title>Code</title>
<para>
<indexterm><primary>computeroutput</primary><secondary>code</secondary></indexterm>
For displaying a snippet of code, a filename or anything else you just want to appear as a part of
a sentence, we use
<ulink
url="http://docbook.org/tdg/en/html/computeroutput.html"><computeroutput><computeroutput></computeroutput></ulink>
and <ulink
url="http://docbook.org/tdg/en/html/code.html"><code><code></code></ulink>
tags.
These replace the HTML-tag <code><code></code> tag,
depending on whether the tag is describing computer output or
computer code.
</para>
<para>
For bigger chunks of code such as SQL-blocks, the tag
<ulink url="http://docbook.org/tdg/en/html/programlisting.html"><computeroutput><programlisting></computeroutput></ulink> is used. Just wrap your code block in it; mono-spacing, indents and all that stuff is taken care of
automatically.
</para>
<para>For expressing user interaction via a terminal window, we wrap
the <ulink
url="http://docbook.org/tdg/en/html/screen.html"><computeroutput><screen></computeroutput></ulink>
tag around text that has been wrapped by combinations of <ulink
url="http://docbook.org/tdg/en/html/computeroutput.html"><computeroutput><computeroutput></computeroutput></ulink>
and <ulink
url="http://docbook.org/tdg/en/html/userinput.html"><userinput><userinput></userinput></ulink>
</para>
</sect2>
<sect2 id="dbprimer-links">
<title>Links</title>
<para>
<indexterm><primary>Linking</primary></indexterm>
Linking falls into two different categories: inside the book you're making and outside:
</para>
<variablelist>
<varlistentry>
<term><emphasis role="strong">1. Inside linking, cross-referencing other parts of your book</emphasis></term>
<listitem><para>
By having unique <computeroutput>id</computeroutput>'s you can cross-reference any part of your book
with a simple tag, regardless of where that part is.
</para>
<para><indexterm><primary>xref</primary><secondary>linkend</secondary></indexterm>Check out how I link to a subsection of the Developer's Guide:</para>
<para>Put this in your XML:</para>
<programlisting>
- Find information about creating a package in
<xref linkend="packages-making-a-package"></xref>.
</programlisting>
<para>And the output is:</para>
<programlisting>
- Find information about creating a package in
<xref linkend="packages-making-a-package"/>.
</programlisting>
<para>
Note that even though this is an empty tag, you have to either:
</para>
<orderedlist>
<listitem>
<para>
Provide the end-tag, <computeroutput></xref></computeroutput>, or
</para>
</listitem>
<listitem>
<para>
Put a slash before the ending-bracket: <computeroutput><xref linkend="blahblah"/></computeroutput>
</para>
</listitem>
</orderedlist>
<para>If the section you link to hasn't a specified <computeroutput>xreflabel</computeroutput>-attribute,
the link is going to look like this:</para>
<para>Put this in your XML:</para>
<programlisting>
-Find information about what a package looks like in
<xref linkend="packages-looks"></xref>.
</programlisting>
<para>And the output is:</para>
<programlisting>
- Find information about what a package looks like in
<xref linkend="packages-looks"/>.
</programlisting>
<para>
Note that since I haven't provided an <computeroutput>xreflabel</computeroutput> for the subsection,
<computeroutput>packages-looks</computeroutput>, the
parser will try its best to explain where the link takes you.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><emphasis role="strong">2. Linking outside the documentation</emphasis></term>
<listitem><para>
<indexterm><primary>ulink</primary></indexterm>
If you're hyper-linking out of the documentation, it works almost the same way as HTML - the tag is just
a little different
(<ulink url="http://docbook.org/tdg/en/html/ulink.html"><computeroutput><ulink></computeroutput></ulink>):
</para>
<programlisting><ulink url="http://www.oracle.com/">Oracle Corporation</ulink></programlisting>
<para>
....will create a hyper-link to Oracle in the HTML-version of the documentation.
</para>
<para><emphasis role="strong">NOTE:</emphasis> Do NOT use
ampersands in your hyperlinks. These are reserved for
referencing <ulink
url="http://www.docbook.org/tdg/en/html/ch01.html#s-entities">entities</ulink>.
To create an ampersand, use the entity <code>&amp;</code>
</para></listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2 id="dbprimer-graphics">
<title>Graphics</title>
<para>
<emphasis>
<emphasis role="strong">Note:</emphasis> The graphics guidelines
are not written in stone. Use another valid approach if it works better
for you.
</emphasis>
</para>
<para>
<indexterm><primary>Graphics</primary><secondary>Images</secondary></indexterm>
To insert a graphic we use the elements
<ulink url="http://docbook.org/tdg/en/html/mediaobject.html"><computeroutput><mediaobject></computeroutput></ulink>,
<ulink url="http://docbook.org/tdg/en/html/imageobject.html"><computeroutput><imageobject></computeroutput></ulink>,
<ulink url="http://docbook.org/tdg/en/html/imagedata.html"><computeroutput><imagedata></computeroutput></ulink>,
and
<ulink url="http://docbook.org/tdg/en/html/textobject.html"><computeroutput><textobject></computeroutput></ulink>.
Two versions of all graphics are required. One for the Web
(usually a JPEG or GIF), and a brief text description. The
description becomes the ALT text. You can also supply a version for print (EPS).
</para>
<programlisting>
<mediaobject>
<imageobject>
<imagedata fileref="images/rp-flow.gif" format="GIF" align="center"/>
</imageobject>
<imageobject>
<imagedata fileref="images/rp-flow.eps" format="EPS" align="center"/>
</imageobject>
<textobject>
<phrase>This is an image of the flow in the Request Processor</phrase>
</textobject>
</mediaobject>
</programlisting>
<para>
Put your graphics in a separate directory ("images") and link to them
only with relative paths.
</para>
</sect2>
<sect2 id="dbprimer-lists">
<title>Lists</title>
<para>
<indexterm><primary>lists</primary></indexterm>
Here's how you make the DocBook equivalent of the three usual HTML-lists:
</para>
<variablelist>
<varlistentry>
<term><emphasis role="strong">1. How to make an <ul></emphasis></term>
<listitem><para>
Making an unordered list is pretty much like doing the same thing in HTML - if you close your <computeroutput><li></computeroutput>, that is. The only differences are that each list item has to be wrapped in something more, such as
<computeroutput><para></computeroutput>, and that the tags are called
<ulink url="http://docbook.org/tdg/en/html/itemizedlist.html"><computeroutput><itemizedlist></computeroutput></ulink>
and
<ulink url="http://docbook.org/tdg/en/html/listitem.html"><computeroutput><listitem></computeroutput></ulink>:
</para>
<programlisting>
<itemizedlist>
<listitem><para>Stuff goes here</para></listitem>
<listitem><para>More stuff goes here</para></listitem>
</itemizedlist>
</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term><emphasis role="strong">2. How to make an <ol></emphasis></term>
<listitem><para>
The ordered list is like the preceding, except that you use
<ulink url="http://docbook.org/tdg/en/html/orderedlist.html"><computeroutput><orderedlist></computeroutput></ulink> instead:</para>
<programlisting>
<orderedlist>
<listitem><para>Stuff goes here</para></listitem>
<listitem><para>More stuff goes here</para></listitem>
</orderedlist>
</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term><emphasis role="strong">3. How to make a <dl></emphasis></term>
<listitem><para>
This kind of list is called a <computeroutput>variablelist</computeroutput> and these are the tags you'll need to
make it happen:
<ulink url="http://docbook.org/tdg/en/html/variablelist.html"><computeroutput><variablelist></computeroutput></ulink>,
<ulink url="http://docbook.org/tdg/en/html/varlistentry.html"><computeroutput><varlistentry></computeroutput></ulink>,
<ulink url="http://docbook.org/tdg/en/html/term.html"><computeroutput><term></computeroutput></ulink> and
<ulink url="http://docbook.org/tdg/en/html/listitem.html"><computeroutput><listitem></computeroutput></ulink>:</para>
<programlisting>
<variablelist>
<varlistentry>
<term>Heading (<dt>) goes here</term>
<listitem><para>And stuff (<dd>)goes here</para></listitem>
</varlistentry>
<varlistentry>
<term>Another heading goes here</term>
<listitem><para>And more stuff goes here</para></listitem>
</varlistentry>
</variablelist>
</programlisting>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2 id="dbprimer-tables">
<title>Tables</title>
<para>
<indexterm><primary>informaltable</primary><secondary>table</secondary></indexterm>
DocBook supports several types of tables, but in most cases, the
<ulink url="http://docbook.org/tdg/en/html/informaltable.html"><computeroutput><informaltable></computeroutput></ulink>
is enough:
</para>
<programlisting>
<informaltable frame="all">
<tgroup cols="3">
<tbody>
<row>
<entry>a1</entry>
<entry>b1</entry>
<entry>c1</entry>
</row>
<row>
<entry>a2</entry>
<entry>b2</entry>
<entry>c2</entry>
</row>
<row>
<entry>a3</entry>
<entry>b3</entry>
<entry>c3</entry>
</row>
</tbody>
</tgroup>
</informaltable>
</programlisting>
<para>
With our current XSL-style-sheet, the output of the markup above will be a simple HTML-table:
</para>
<informaltable frame="all">
<tgroup cols="3">
<tbody>
<row>
<entry>a1</entry>
<entry>b1</entry>
<entry>c1</entry>
</row>
<row>
<entry>a2</entry>
<entry>b2</entry>
<entry>c2</entry>
</row>
<row>
<entry>a3</entry>
<entry>b3</entry>
<entry>c3</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
If you want cells to span more than one row or column, it gets a bit more complicated - check out
<ulink url="http://docbook.org/tdg/en/html/table.html"><computeroutput><table></computeroutput></ulink>
for an example.
</para>
</sect2>
<sect2 id="dbprimer-emphasis">
<title>Emphasis</title>
<para>
<indexterm><primary>emphasis</primary><secondary>bold, italics</secondary></indexterm>
Our documentation uses two flavors of emphasis - italics and bold type. DocBook uses one -
<ulink url="http://docbook.org/tdg/en/html/emphasis.html"><computeroutput><emphasis></computeroutput></ulink>.
</para>
<para>
The <computeroutput><emphasis></computeroutput> tag defaults to italics when parsed. If you're looking for
emphasizing with bold type, use <computeroutput><emphasis role="strong"></computeroutput>.
</para>
</sect2>
<sect2 id="dbprimer-indexing" xreflabel="Indexing Your DocBook">
<title>Indexing Your DocBook Documents</title>
<para>
Words that are marked as index-words are referenced in an index
in the final, parsed document.
</para>
<para>
Use
<ulink url="http://docbook.org/tdg/en/html/indexterm.html"><computeroutput><indexterm></computeroutput></ulink>,
<ulink url="http://docbook.org/tdg/en/html/primary.html"><computeroutput><primary></computeroutput></ulink> and
<ulink url="http://docbook.org/tdg/en/html/secondary.html"><computeroutput><secondary></computeroutput></ulink>
for this. See these links for an explanation.
</para>
</sect2>
<sect2 id="dbprimer-converting" xreflabel="Converting to HTML">
<title>Converting to HTML</title>
<note>
<para>This section is quoted almost verbatim from the LDP Author Guide.</para>
</note>
<para>
Once you have the <xref linkend="dbprimer-validation"/>
installed, you can convert your XML documents to HTML or other
formats.
</para>
<para>
With the DocBook XSL stylesheets, generation of multiple files
is controlled by the stylesheet. If you want to generate a
single file, you call one stylesheet. If you want to generate
multiple files, you call a different stylesheet.
</para>
<para>
To generate a single HTML file from your DocBook XML file,
use the command:
</para>
<screen>
<computeroutput>bash$ </computeroutput><userinput>xsltproc -o outputfilename.xml /usr/share/sgml/docbook/stylesheet/xsl/nwalsh/html/html.xsl filename.xml</userinput>
</screen>
<note>
<para>
This example uses Daniel Veillard's <emphasis
role="strong">xsltproc</emphasis> command available
as part of libxslt from <ulink
url="http://www.xmlsoft.org/XSLT/">http://www.xmlsoft.org/XSLT/</ulink>.
If you are using other XML processors such as Xalan or Saxon,
you will need to change the command line appropriately.
</para>
</note>
<para>
To generate a set of linked HTML pages, with a separate page
for each <chapter>, <sect1> or <appendix> tag, use the
following command:
</para>
<screen>
<computeroutput>bash$ </computeroutput><userinput>xsltproc /usr/share/sgml/docbook/stylesheet/xsl/nwalsh/html/chunk.xsl filename.xml</userinput>
</screen>
<para>
You could also look at the <ulink url="https://raw.githubusercontent.com/openacs/openacs-core/master/packages/acs-core-docs/www/xml/Makefile">acs-core-docs Makefile</ulink>
for examples of how these documents are generated.
</para>
</sect2>
<sect2 id="db-primer-further-reading" xreflabel="Docbook Further Reading">
<title>Further Reading</title>
<itemizedlist>
<listitem>
<para><ulink url="http://www.xml.com/lpt/a/2002/07/31/xinclude.html">Using Xinclude</ulink></para>
</listitem>
<listitem>
<para>
The <ulink
url="http://www.tldp.org/LDP/LDP-Author-Guide/html/index.html">LDP Author
Guide</ulink> has a lot of good information, a table of
docbook elements and their "look" in HTML and lots of good links
for tools.
</para>
</listitem>
<listitem><para>
James Clark
wrote <link linkend="nxml-mode">nXML Mode</link>, an alternative
to PSGML Mode. nXML Mode can validate a file as it is edited.
</para></listitem>
<listitem><para>
David Lutterkort
wrote an <link linkend="psgml-mode">intro to the PSGML Mode in Emacs</link>
</para></listitem>
<listitem><para>
James Clark's free Java parser
<ulink url="http://www.jclark.com/xml/xp/index.html">XP</ulink>. Note that
this does not validate XML, only parses it.
</para></listitem>
<listitem><para>
<ulink url="http://sources.redhat.com/docbook-tools/">DocBook Tool for Linux</ulink>:
Converts docbook documents to a number of formats. <emphasis>NOTE: I only got these to
work with Docbook SGML, NOT with Docbook XML. If you are
able to make it work with our XML, please let us know.</emphasis>
</para>
</listitem>
<listitem><para>
AptConvert from <ulink url="http://www.pixware.fr/">PIXware</ulink> is a Java editor that will produce
DocBook documents and let you transform them into HTML and PDF for a local preview before you submit.
</para>
</listitem>
<listitem>
<para>
In the process of transforming your HTML into XML,
<ulink url="http://tidy.sourceforge.net/">HTML tidy</ulink>
can be a handy tool to make your HTML "regexp'able".
Brandoch Calef has made a
<ulink
url="http://web.archive.org/web/20010830084757/http://developer.arsdigita.com/working-papers/bcalef/html-to-docbook.html">Perl
script with directions</ulink> (now via archive.org)
that gets you most of the way.
</para>
</listitem>
</itemizedlist>
</sect2>
</sect1>