Forum OpenACS Q&A: Enhancing the Installation Process: Automation.

There's a thread that's ongoing at aD about the ease of installation of ACS and OpenACS. I thought I'd bring it over here to stimulate discussion. I am serious about working with the community to develop easier install routines. [including the development of RPMs that auto-install the whole works]

> Posted by: Adam Farkas (afarkas@arsdigita.com)
> Topic : web/db
> Subject : Response to Installation Time?
>
> Well... I got my OpenACS box installed last night, however it took
> far longer than Roberto's estimates, mostly due to problems compiling
> the postgres driver. [thank goodness this forum exists, or else I
> would never have gotten it done..]

> I propose that we [that is, the members of the OpenACS community]
> strive to build some RPMs that can _fully_ automate the install
> process of a "default" ACS setup. This is very similar to the way
> that Zope works -- they have a version of it that is distributed with
> "Zap", a modified apache IIRC. The whole mess installs in < 2
> minutes, and boom -- you're up & running.

> We need something similar for the OpenACS. Something that can
> install, if need be, AOLserver (with the PG driver), PG7, and
> OpenACS, load the data model & get the service started automatically
> at boot.

> Are there people in the community who can make this happen?
> ArsDigita is willing to devote resources to helping this process out,
> if necessary.
thanks.

There are 2 parts to the install process, at least that is how I would
break it down:  installing the software needed, and then configuring
the bits so that they reflect what is desired.

Part 1. is "get the stuff on the hard drive, and up and running"  this
means that OpenACS files, Postgres, AOLserver are installed on the
system, and that AOLserver can talk to Postgres.  A directory (say
/web/server1/) is chosen and the ACS files are moved over.  Then, the
data-model.sql file is loaded and checked for errors.

Part 2. is unknown territory, I think, for ACS since there has never
really been a configuration interface for ACS.

That is, there are options that can only be configured by hand-editing
certain files.  A text-mode Perl or TCL program that asked some
customization questions and kept the rest as sane defaults (and backed
up your nsd.tcl at the same time) would be good.  Would it make sense
to have a "setup server" that let you configure the ACS installation?
Would it make sense to have a very minimal, bootstrap nsd.tcl, and
once you got the DBMS connection working, ALL other settings were in
the database? -- that would make it much easier to set up Web forms to
allow configuration.

I looked at Zope - I found its performance and ODB underwhelming (if
not laughable - then again I have not tweaked it).  But the 2-minute
install, and nice Web interface to security and configuration make it
pretty nice...

./././patr

Right now i'm focusing on Part #1 -- getting it up & running. It's a well-defined problem that I think we can tackle.

As for Part #2,  an HTML-interface is something that we should seriously consider for many administrative tasks.  But it's probably a bigger job.

For now, i'd be thrilled to get some RPM's and a 5-minute no-hassle install.

Collapse
Posted by Don Baccus on
You should talk to Lamar Owen, who has done the RedHat RPMs for Postgres.  At one time, he was talking about setting stuff up to run out of the box, so to speak, much as has been done for the Debian "Potato" release.

(for that matter, you might want to grab the Debian release - I think "Potato" is out now - and see how well it works, I'm sure the guy who did it would like feedback).

Another route would be for us to provide RPMs of versions of stuff we know works, including PG compiled for a 16KB blocksize and --with-tcl enabled, the AOLserver driver, etc, rather than depend upon RPMs released with RH (the Postgres RPM in particular).

Collapse
Posted by Lamar Owen on
Did I hear my name? 😊

Working OpenACS RPM's that can meet the RPM criteria are going to be tough.  Doable, but tough.

Oh, getting an automated 5-minute install isn't the killer --- getting the automatic upgrade right is going to make for some sleepless nights.

First, there will need to be interaction between at least five RPM's:
postgresql
postgresql-server
aolserver
aolserver-nspostgres
openacs

(or sub openacs-postgres instead of aolserver-nspostgres).  This isn't difficult with RPM -- RPM knows about and can be taught package dependencies and installation order.

It becomes more difficult in the pre and post installation scriptlets. I call them scriptlets because you want to do the absolute minimum necessary here.  The more you do, the harder it is to debug, the more difficult it will be to get working right, and the less likely RedHat will ship it.

Debian's install scripts have far greater latitude here -- RPM's scriptlets are designed to do such mundane tasks as run ldconfig -- not parse and modify configuration files.

Now, default configs _can_ be placed.  But, currently, the ACS configuration is far too installation-specific -- there needs to be a single place for variable declarations for pageroot, etc -- while ad.tcl is a good template, there are _many_ variables defined that are terribly installation specific.

Now that I've given the pessimistic side, let's look a little closer at what we _can_ do:

We CAN separate a portion of the install out to utility scripts that can parse and mungle config files;

We CAN install the ACS to a central, nominally read-only 'holding cell' (such as /usr/share/openacs), that the utility installation script then pulls from and installs to the chosen webroot (but we CANNOT force a user into using /web as the webroot -- I lost count of how many places /web is hardcoded in the configs -- and RedHat most definitely won't ship a package requiring a new directory off the system root!);

We CAN make the upgrade of the 'holding cell' automatic -- but I don't think we should muck around with someone's live site during an automated upgrade;

We CAN use dirs under /etc for configuration, dirs under /var/lib for pageroots, dirs under /usr/share for shared data and master templates;

We CAN do all kinds of nonsense in scripts outside of the RPM scriptlets.

The current Debian package is a good starting point -- but for OpenACS RPM's to fly, a good set of AOLserver RPM's must fly.

I am ready, willing, and able to do any and all of these -- within my time constraints.  The last two months have been terrible -- the next two might not be so bad.

RedHat 7.0 is due Any Day Now (they just went public beta) -- RedHat 7.1 should theoretically be due sometime next March, if they stick to their typical six month schedule.  If we have the goal of solid RPM's by RedHat 7.1's freeze date (which likely will be mid to late January), then I believe, with some creative help, I can meet that goal.

There will be much to discuss and much to plan with these -- I know from hard experience that once you've released an RPM, you then have to upgrade from that RPM -- major changes to RPM structure become royal pains at that point.  So, the layout and installation procedure need to be worked out _first_ -- in fact, that doesn't need to be RPM-specific AT ALL -- the same scripts an RPM user would use to throw up an OpenACS site are just as usable for any OpenACS user.

But, I can already tell you, the current '/web'-centricity won't fly in the RPM world.  So, the installation procedure, layout, and scripts need to be applicable regardless of where OpenACS is installed -- use environment variables or global configuration files located in /etc/openacs to find everything.

Do it right, and I can even give you aolserver-chroot and openacs-chroot.

Then there's the upgrading issue.  Do we want to go there?  Just how upgradable is the exiting ACS installation?  No, a 'dump your data then reload it after loading a new datamodel' is not an upgrade path. Like the installation, the same scripts an RPM would use would be terrific for ANY ACS user -- not just an RPM user.  If you even want to use perl for those scripts, that's fine -- as perl is almast guaranteed to be installed on a working RedHat system

I'm currently at war over the PostgreSQL upgrade procedure -- or lack thereof -- there seems to almost be hostility towards the idea that a smooth upgrade is a pressing issue -- dump/restore has been done there so long that the foolishness of that approach is invisible to core developers.

I would hope to see a little more cooperation here for the needs of those who simply want to upgrade smoothly -- and a more flexible installation/upgrade situation means more ACS users, RPM or not.

And I'm willing to help out in this area as well.  Good solid installer/upgrader scripts central to the core ACS/OpenACS packages will make advocating the package much easier, both to RedHat, and to users of other packages.  Having specialized RPM install scripts won't help those who want to install OpenACS/AOLserver on a more secure OS (such as OpenBSD).  Let's try for the general case, then RPM's will be much easier and more robust.

Collapse
Posted by Ben Adida on
I am really glad we have Lamar on this team :) This is precisely the information we need. I think planning this out correctly will be a big win in many ways.

At the same time, let's make sure we agree on some principles:

- upgrading a customized OpenACS will *not* be possible. This is just the way it is. With a data model, edited files, this won't be doable automatically.

- upgrading a centralized holding area is nice, but not that impressive, since it's just Tcl code (I suspect the holding area won't have a DB instance associated with it, right?).

- the real issue here is allowing people to quickly install OpenACS to get a web site up and running. It is a "booster" toolkit, not really an application that can easily be upgraded.

Thus, how acceptable/horrible would it be to create RPMs that provide only install capability, and not upgrade. The purpose is clear: to facilitate the initial installation. I think if we do that without making claims to other neato RPM features, we'll have made significant progress.

Collapse
Posted by Richard Li on
FYI: ACS4 has an HTML wizard-type installer that sets up your configfiles, loads your data models, and installs the selected APM packagesautomatically.
Collapse
Posted by Don Baccus on
I think Ben's right.  Additionally, I think the issue of PG upgradability is far more important to serious users of the database (not necessarily OpenACS) - pg_dump'ing when your dump is greater than  2GB is a royal pain, with the user needing to plot a strategy of dumping out tables in an order that doesn't break dependencies, etc.  As long as pg_dumps occassionally need editing or massaging to reliably reload, this will always be true.

Plus, it's a bush-league "feature" of PG.

Same with the need to initdb so frequently, as I've mentioned to the developers in the past, the use of SQL in (say) Oracle to init system tables as much as possible rather than have them built by internal magic is slow, but for active users a lot less annoying than destroying and rebuilding your database with each and every release, as is necessary when upgrading Postgres.  Many changes could be accomplished with judicious updates and inserts into system tables via  SQL.

So Lamar can count on my support on the PG upgrade issue when it next arises (Ned Lilly might be a useful point of attack on this one, Lamar, Great Bridge seems more sensitive to broad-based user needs than the developer's group has been traditionally).

But upgrading an existing ACS installation automatically isn't really possible, as Ben mentions.  Among other things, the current implementation of "alter table" is too weak to make it possible in the  general case even if customization issues were moot (which they aren't, of course).

And a quick install to get things underway is what folks really need.  You can't customize without digging under the hood, so by the time upgrading is necessary the user should be pretty familiar with the pieces.  Upgrade .sql scripts are something we should supply, but I at  least would never want to run them via an automatic script, at least not until having studied them to see how they interact with my customizations.  It is very common when customizing the ACS to create tables which relate to "users" and similar core tables, so upgrades need to be studied before they're applied in all but the most trivial circumstances.

When I get back in October, I'll be willing to help out a bit on this project, at least to the extent of helping to test RPMs and review/help with documentation (I don't know anything about how they're built and created and am not all that eager to learn that level of detail).

Collapse
Posted by Lamar Owen on
[Glad to be here, Ben.  This is something I actually know a little about -- although I'm learning more and more SQL as the days go by (mostly due to the PostgreSQL/PHP-backed website I'm contracting for)]

Well, Ben, Don: whether we support upgrading or not is really immaterial from an RPM standpoint -- because there is no such thing as an RPM upgrade, in actuality.

While 'rpm -U' and kin give the impression of an upgrade, it really isn't an upgrade unless the RPM has scriptlets to make it one -- other than some fancy dependency/file/etc stuff that goes on in RPM's database, that is, but we don't need to go into that for this discussion 😊.  Without the scripts it's just an 'install new version/wipe old version' ritual that is very well automated.

What does 'rpm -U' do that 'rpm -i' doesn't?  The 'rpm -i' simply won't overwrite what's already there -- 'rpm -U' installs the new RPM on top of the old RPM's contents -- overwriting files as it is instructed.  Then, 'rpm -U' performs an 'rpm -e' of the old package -- skipping, of course, overwritten files.  A package contains more than just files -- there is version info, dependency info, membership info, and other info that is critical to proper installation/upgrading that RPM just make do the Right Thing (most of the time, that is).  Berkeley DB sits at the backend of RPM's database.

Now, here's the tickler -- most RPM users are accustomed to simply typing 'rpm -Uvh list-of-package-names' and letting RPM sort out the fallout -- beyond that, most RedHat users upgrade by booting the new OS disk and selecting 'Upgrade' -- and anaconda (RedHat's installer) will then dutifully do an 'rpm -U' for each package that needs it -- with no user interaction possible other than selecting which packages to upgrade.  No, the package itself can't deselect itself, unfortunately.

A 'holding cell' prevents user sites from being overwritten, while also providing a local repository of the latest OpenACS code.  And, if our goal of RedHat shipping OpenACS is realized, we _will_ have to confront this demon sooner or later.  There are no special cases allowed by anaconda (except on the much less valuable (for our purposes) 'PowerTools' CD).  Although it is going to be a neat trick to get RedHat to put AOLserver on the main CD set (it's up the three CD's, now).

If planned for and implemented properly, sure, it's possible to put a DB instance on the holding cell, for the impressionable folks.  aD is currently running arsdgita.com on a CVS checkout, right?  If they can do a cvs update on a live tree and get results, then we should be able to do similar.  HOWEVER, datamodel changes are much trickier -- but, maybe the policy should be that the holding cell installation's datamodel is going to be blown away at every upgrade?  I _know_ the datamodel is going to be rough to update in-place on PostgreSQL.

As a 'booster toolkit', it is, IMHO, essential to have live sites uninterrupted during upgrades -- which are unavoidable on a RedHat system run by typical RedHat admins -- thus, my suggestion of a holding cell -- /usr/share/openacs-version is the cell, upgradable at will.  The install scripts simply make copies from the repository as needed for fresh installs -- and customized sites aren't touched.

Although migration and update 'guidescripts' would be nice, to assist with determining just what needs to be updated on any particular ACS install.

But, the short of it is an RPM is not practically made 'install-only' -- unless you want our RPM's abort of an upgrade to abort someone's OS upgrade (a real quick way to get kicked off the RedHat CD's....).  RPM's are pretty brain-dead -- but that is one of the reasons RPM-based systems, when the packager(s) plan and build dependency trees properly, install and upgrade so robustly: there is very little happening, which means that there is very little that can go wrong.

----
Tangent:
This HTML-wizard package manager sounds intriguing.....

----

Don, your help is greatly coveted.  I have already shaken the Great Bridge tree, but no apples have fallen as yet.  Gravity must be weaker in Norfolk.  Actually, I was contacted by them regarding installation issues, and I threw the upgrade issue out with the reply.

Since we can't practically prevent a user from attempting an upgrade due to RPM's design, we will have to work around it -- the holding cell idea is one way to provide an upgrade 'target' that is safely overwritable -- I'm sure there are others.

Comments?

I really would like an OpenACS RPM and so far the difficulties mentioned that I understand are:

    1. Installing direct to a /web partition where the disk space and file system is correct size and type is awkward.

    2. Upgrading a Postgres data structure without a dump and reload is ... shall we call it beyond the state of the art for Postgres?

    3. Upgrading TCL scripts, SQL scripts and in-production user introduced customizations is well... well hard?

    I have a thought, this problem needs to be subdivided into smaller pieces. Now I am just thinking here... like a pizza, there are more ways to slice it than just one.

    First; nsd, the java mail programs, and Postgres can be gathered up as RPMs.

    The hard part is the OpenACS system has data structures, algorithms and user visible functions spread out.

    Parts of OpenACS are specific to AOLServer 3.0. Other parts require TCL 7.6. Some important database functions will be solved differently when Postgresql changes to version 7.2

    A data structure like "ecommerce.sql" and the hard work of offering that data correctly to users occurs in several different places, and so far I am mostly just grepping to get an idea of what happens.

    So here is a proposal, maybe one way to organize OpenACS is to create a Postgres relational database and copy the entire OpenACS system into that data structure.

    So, a select * from modules where version = "3.2.2" ; would regenerate the entire OpenACS 3.2.2 system.

    The idea is to store OpenACS in a series of relational tables. The list of queries we wish to run against these tables will drive the design of the table data structure.

    So the RPM or tarball or whatever of OpenACS has these parts:

    1. A front end script.

    2. A dumpfile or a complete database.

    3. Dependency stuff for Postgresql RPMs etc.


    So, what sort of queries would this database respond to?

    1. Output the documentation for installation, for any module, for any item, table, screen, transaction. Search every comment in every file. A one line title for every paragraph. FAQ's, OpenACS installation tables.

    2. Output all the tables and screens for any version.

    3. Output the changed parts between versions for any module.

    4. Do a diff on a production system against any reference version.

    5. Give OpenACS users an elegant way to contribute all the pieces of a new module back to OpenACS without driving Ben Adida crazy.

    That means we need a gadget to Parse files in a production system and complain about missing documentation, missing comments. And finally, import the diff of the production system into the database.

    6.  We need to pick a good way to label each file with the dependencies on specific versions of TCL, Postgresql, and nsd that are now embedded implicitly in the big version number for the entire OpenACS suite of files.
A production installation needs lots of help working through the hassles of sticking with an old reliable executable while adding new functionality from later OpenACS releases.


    7. My OpenACS system, the whole caboodle of 3.2.2 plus AOLServer 3.0 is 4 gigs, very roughly.
[A quoted reply system would be nice....]

[First, thanks for taking the time to respond!]

<blockquote>1. Installing direct to a /web partition where the disk space and >file system is correct size and type is awkward.
</blockquote>

No, it's not 'awkward' -- it's against the Linux Filesystem Hierarchy Standard version 2.1 (available at pathname.com).  For RPM's to be shippable by RedHat, they have to be FHS-2.1 compliant.  /web is far from compliant.

<blockquote>    3. Upgrading TCL scripts, SQL scripts and in-production user > introduced customizations is well... well hard?
</blockquote>

Impossible, not hard.  If not impossible, totally impractical.  And unwarranted -- someone might actually _want_ the old behavior...

<blockquote>    First; nsd, the java mail programs, and Postgres can be gathered >up as RPMs.
</blockquote>

PostgreSQL RPMs already are done.  Need information ofn the preferred JRE to use for nsjava RPM's.  AOLserver will have its own RPMset -- including a separate postgres-driver subpackage.

<blockquote>    So the RPM or tarball or whatever of OpenACS has these parts:
</blockquote>

<blockquote>    1. A front end script.
</blockquote>

<blockquote>    2. A dumpfile or a complete database.
</blockquote>

Can't go into the standard PostgreSQL database location, unless you want to anger existing PostgreSQL users by overwriting their data. So a secondary location will have to be chosen -- and the existing PostgreSQL RPM startup scripts don't work well with that, so a separate script would have to be built... Doable, but not pretty.  But, the binary data tree for each architecture possibly would be different, due to endian issues.....

<blockquote>    3. Dependency stuff for Postgresql RPMs etc.
</blockquote>

RPM handles nearly all of the dependencies in a fully automated fashion -- you don't have to specify most of them.

the whole idea of 'OpenACS in a database' is already done for the most part -- it's called CVS.  Storing the whole OpenACS inside a PostgreSQL database is likely to run up against the tuple size limit.

<blockquote>    6. We need to pick a good way to label each file with the >dependencies on specific versions of TCL, Postgresql,
</blockquote>

This could be useful -- but not just for RPM's.  Of course, it makes it horrifically complicated for the developers, who have to do the labeling of version information.  Is it not maybe better to say 'to upgrade to version x.yy of OpenACS requires PostgreSQL zz.aa.bb, AOLserver cc.dd.ee, and the driver version ff.gg.hh?  There are many changes between versions that are not incremental -- 3.x to 4.x, for example, will require changes to nearly every Tcl file in every module.

As to package dependencies as I said above, RPM handles that very well and very simply -- and almost entirely automatedly.  Very few dependencies have to be hand-entered -- and even those are on a package-wide basis, not file-by-file.

Interesting thoughts, though.

The dependency stuff will be largely handled by the 4.0 package manager - ACS version dependencies, not PG, which should be OK.  PG version dependencies will be global to the OpenACS version, most likely, given that much of the datamodel and many of the utility routines are shared.  4.0 will probably require PG 7.1, for instance, unless we choose to ignore the new outer join functionality which will  probably appear in that version!
Lamar - where should "/web" be moved to?  There's no reason we have to  stick with this, especially in the RPM'd version.  Where does Apache expect things by default when you bring up their examples?
"PostgreSQL RPMs already are done. Need information ofn the preferred JRE to use for nsjava RPM's. AOLserver will have its own RPMset -- including a separate postgres-driver subpackage."

nsjava requires a jdk not a jre. So far the only jdk that is acceptable for use on linux is Blackdown's jdk1.1.8.

RE: /web
Apache RPM's on RedHat use /var/www.  I'd say use /var/lib/openacs/servername/whatever -- or even /var/openacs/servername/whatever -- although that is not strictly FHS compliant.  The first one is.

The best thing is to variablize the dependency on the hardcoded /web out -- what's wrong with a $serverhome, $webroot, or similar?  This way any number of configs can be supported -- and, more importantly, multiple ACS installations per box (one per AOLserver 3.0 process) can be supported easily.

RE: JRE vs JDK.  Thanks for the clarification, Dan.

If I get any more interested in how to automate the OpenACS installation, I will have to substitute deeds and code for words. But I am not a very good programmer, I am mostly an administrator.

    I restate the post above. RPM alone isn't flexible enough to make an OpenACS distribution.

    OpenACS users will add modules. Somebody may port OpenACS to PerlCGI and Apache. Somebody may migrate back to Oracle.

    Some OpenACS users will build large, professional and dedicated systems. Some users will work on a shoestring  with four 512 meg hard disks in a DX-100 box in an hospital in a 3rd world country.

    On a development side, how can the volunteer keepers of the pristine source cope with the widely variable quality and widely variable dependencies of many users?

    I suggest the OpenACS RPM contain a database of OpenACS systems and contributed modules. Plus the RPM will contain conventional tools and documentation.

    So OpenACS would put out one RPM. The RPM can actually be upgradable even at a production site, and the only dependency would be for a Postgres database.

    This RPM would go to a fixed file location like RPMs must.

    This RPM would offer: documentation, tools, contributed tools, a database access front end (a miniOpenACS maybe) and several separtate databases or dumpfiles.

    The user would run the database front end to either get a copy of the current OpenACS or run select statements to pull up an alternate system or information about OpenACS variations.

    A new user can get started with the basic OpenACS reference distribution. An established system user can go shopping for new modules fitting the constraints present in the established system.

    A database supports user contributed growth and evolution of the OpenACS system. The user would run contributed tools on her existing OpenACS system to make a diff of her system for contribution back to the OpenACS contributions database.


    A database of OpenACS and contributions supports the idea of building a tool box, building on other people's work, creating a setting where many people around the world contribute to the development of OpenACS to serve many human needs.

    A database would change the OpenACS coordinator's problem to figuring out how to clarify the contributions. The coordinators would have a problem of nearly redundant contributions. The coordinators would need to write a new module for OpenACS to marshall contributors into "workgroups". Like bboard, OpenACS would be prospecting for volunteer coordinators to massage contributed files into named and tested and  blessed modules.

    There could even be a feedback loop: a  user starts with a reference system, modifies it, contributes a developmental diff back to OpenACS. Coordinators massage several developmental diffs into a new reference quality module, OpenACS creates for the contributors, a upgrade diff that replaces the developmental diff with the new reference quality code.

    Un-normalized view of the system and contributions database:

    Filename, author, date, comment(s), statement(s), earliest-ACS version, latest-ACS-version, called-by(), calls(), earliest-database-version, latest-database-version, script-language, earliest-script-version, latest-script-version, table(s), line-number, module-name(s), web-example, author-email, license.
I restate the post above. RPM alone isn't flexible enough to make an OpenACS distribution.
I think we read your first post, I think we simply disagree with it, at least in regard to how to distribute OpenACS on standard RedHat CD images. As a proof by counterexample, note that OpenACS is already distributed by Debian in package form, therefore it seems obvious that RPMs are flexible enough to make an OpenACS distribution (though some feel RH RPMs are less flexible than Debian's system, they're of roughly equivalent power).

If people port OpenACS to Perl, they can maintain their own distribution as a layer on our datamodel, if they wish to. We're very unlikely to support multi-language versions of OpenACS. If we did, we could simply release it as a separate RPM anyway.

As far as much of the rest of what you're talking about, on our site (openacs.org) we will maintain descriptions of modules, their release history, etc in the database, using the SDM, with code archived via CVS. When we speak of RPM distribution, we're speaking of a fixed cut of a well-defined release. Users who want to fish from contributed modules that haven't made it into the release, etc, can do so from openacs.org. The ACS 4.0 package manager will help ensure that version dependencies, etc are correct when the package is actually installed by the user. The necessary information will be stored in the database by the package manager describing each user's exact configuration.

We very likely will end up with (additional) volunteer coordinators to work with people who want to contribute modules, etc. Given that today's "coordinators" are volunteers and that OpenACS is gathering steam, it's almost a given.

To sum up, I think you may be confusing two separate issues, the first being "how do we manage the variety of versions, modules, etc which will grow as OpenACS grows", the second being "how do we cut fixed releases for Linux distributions that require them". Using the database and associated software modules to manage the first problem makes a lot of sense (and is why the SDM exists and why aD is writing the package manager for ACS 4.0), but not the second, IMO.