Forum OpenACS Development: OpenACS 4 fails to install

Collapse
Posted by Yon Derek on
I'm trying to install freshest CVS OpenACS on RH62/PG 7.1.2/AOLServer32ad12 and it acs-content-repository doesn't install correctly on startup (and others follow).

Questions:

  • is this a known problem?
  • the last think in the log is:
     
    
    [16/Aug/2001:13:51:32][7558.4101][-conn0-] Notice: BEGIN
    [16/Aug/2001:13:51:32][7558.4101][-conn0-] Notice: *ABORT STATE*
    [16/Aug/2001:13:51:32][7558.4101][-conn0-] Notice: COMMIT
    [16/Aug/2001:13:51:32][7558.4101][-conn0-] Notice: CREATE
    [16/Aug/2001:13:51:32][7558.4101][-conn0-] Notice: DROP
    [16/Aug/2001:13:51:32][7558.4101][-conn0-] Notice: Acs-content-
    repository not installed.
    Error:
    
    i.e., it doesn't tell me anything. Any tip on how to turn more verbose debugging info so that I know what's wrong (I presume it's a failing SQL but have no idea how to debug it).
  • install happily goes on. Wouldn't it be better to stop on such errors, given that's it's almost a fatal blow (not much will work afterwards anyway).
Collapse
Posted by Don Baccus on
It should be stopping on errors, that's something to look into.  PG puts everything out on stdout so you have to parse output looking for error lines.  We've had some problems getting everything stuffed back into aolserver at times.

If you turn VERBOSE=ON in your nsd.tcl file all queries will be logged .  There's no equivalent in the bootstrap installer at the moment - the switch "--echo-all" needs to be passed to PSQL.  We don't want to do this by default though - the datamodel's HUGE.  There's no PSQL option to just dump queries that cause errors ...

Collapse
Posted by Yon Derek on
As for as stopping on error: apm_packages_full_instal (in apm-install-procs.tcl) catches the error. Re-throwing it would stop it.

Strangely enough, I did get it to work on 7.1.2 on Windows machine. I've re-installed latest 7.1.2 RPMS from postgresql.org but still have the same issue on RH 6.2.

Collapse
Posted by Yon Derek on
First, this patch is highly recommended. Right now error reporting is broken and when you have errors in *sql files you won't know about this. Unless you add this patch, that is:
Index: 00-database-procs-postgresql.tcl
===================================================================
RCS file: /cvsroot/openacs-4/packages/acs-tcl/tcl/00-database-procs-postgresql.tcl,v
retrieving revision 1.20
diff -u -r1.20 00-database-procs-postgresql.tcl
--- 00-database-procs-postgresql.tcl    2001/07/29 23:16:50     1.20
+++ 00-database-procs-postgresql.tcl    2001/08/17 01:02:52
@@ -636,8 +636,8 @@
     }

     if { $error_found } {
-        global errorCode
-        return -code error -errorinfo $error_lines -errorcode $errorCode
+        global errorCode errorInfo
+        return -code error -errorinfo $errorInfo -errorcode $errorCode $error_lines
     }
 }
Collapse
Posted by Don Baccus on
If you can put together a quick patch to rethrow the error I'll install it in the next few days.  I'll be working on OpenACS issues over the weekend but it's going to be administrative stuff, not code, it appears.

There's an acs bootstrap installer module in the openacs-4 package in the SDM (https://openacs.org/sdm) so feel free to submit a patch there.
If you don't have time I'll try to get to it myself but it will be awhile.

The error from your server.log file would help a lot - post it here when you've got a moment (again, you need VERBOSE=ON when you install to get it).

Collapse
Posted by Yon Derek on
More to the point: after adding this patch I've found out that the errors are:
Acs-content-repository not installed.
 Error:
psql:types-create.sql:322: ERROR:  CREATE TABLE: attribute "tree_sortkey" duplicated
psql:content-image.sql:36: ERROR:  acs_object_types_supertype_fk referential integrity violation - key referenced from acs_object_types not found in acs_object_types
psql:content-image.sql:84: ERROR:  cr_content_mime_map_ctyp_fk referential integrity violation - key referenced from cr_content_mime_type_map not found in acs_object_types
psql:content-create.sql:1014: ERROR:  cr_content_mime_map_ctyp_fk referential integrity violation - key referenced from cr_content_mime_type_map not found in acs_object_types
They confuse me, though, e.g., I don't see tree_sortkey anywhere in types-create.sql. Ideas, anyone?
Collapse
Posted by Don Baccus on
types-create.sql calls content_type__create_type, which conditionally does a "create table" for the new type.  It would appear that create is causing the problem.

Now ...

I just did a fresh install of OpenACS 4 on my RH 6.2 box with PG 7.1.2, and didn't run into a problem.

You've gotten it to install without a problem on Windoze.

This makes me thing something's wrong with your PG installation.  Is the box a pure development box?  Can you reinstall PG, make sure you get rid of all old traces of it, do a "initdb" and fresh "createdb" and fresh reinstall of OpenACS4?

I know that sounds like a lot of work but given that it is installing for other and even for you on your windows box, I'm suspicious that something's screwy on the RH 6.2 box.

Also make sure it is really running your PG 7.1.2.  In other words that any old postmaster's been stopped, that the old version's been thoroughly deleted, that you don't have any RH-installed components lying around and in your PATH (RH RPMS install PG in a different place
than the PG distro tarball).

Etc etc etc...

Collapse
Posted by Steve Woodcock on
Sounds like you've got a locale problem
Collapse
Posted by Jonathan Marsden on
For OACS4, the PG 7.1.2 RPMs do the *wrong* thing, in that they install and run PG using the locale set up in /etc/sysconfig/i18n (in Red Hat).

Under Red Hat, the quick fix for now is to do

echo LANG=C >>/etc/sysconfig/i18n

*before* installing PG (from RPMs) and bringing it up for the first time (in particular, before it runs initdb).

Once you have installed the PG 7.1.2 RPMs and done /sbin/service postgresql start for the first time (which does the initdb), you can then remove the added line from your /etc/sysconfig/i18n , which sets things back to your native locale (often LANG=en_US) if you so wish.

This is a bad situation (ideally, OACS4 would work no matter what locale PG was using?), but one I thought Don had recently put a patch into the OACS4 installer to check for?

Note: To try this, you'll need to rpm -e postgresql-server and then rm -rf /var/lib/pgsql to get rid of your old database, then reinstall PG with the edited i18n file. Otherwise your old db with its unwanted locale info will be retained.

Collapse
Posted by Yon Derek on
Thanks, that was indeed a locale problem.

However, in my case solving it wasn't as easy as changing /etc/sysconfig/i18n i.e. just changing it and re-installing postgresql-server didn't work. Databases were still created in "en_US" locale (even though /var/lib/pgsql/initdb.i18n file, presumably created by install, had "C" in it). LANG variable for all my users is still "en_US" despite /etc/sysconfig/i18n being "C" (even after reboot).

To be honest, I don't really know how I solved it. I did all that Jonathan suggested (which in itself was not enough), I switched to user postgres, removed /var/lib/pgsql/data directory, forced LANG to be C (export LANG="C"), and did initdb. Only after that my db was created with locale C (as verified by pg_controldata utility which is in contrib directory of PostgreSQL source distro).

So question: anyone familiar with RH locale business, why LANG is set to "en_US" even though i18n has "LANG=C" in it?

We (as in OpenACS) really need a install-time check for that and a good instructions on how to fix it/check for it. I don't really know which one of the things I did really forced PostgreSQL to create db with the lang I want. Also there is no good/easy way to check for that (except mentioned pg_controldata util which you usually need to compile yourself and run as root or postgres user).

Collapse
Posted by Jonathan Marsden on
Yon,

I tried an OpenACS4 install myself for the first time just now.

The PG locale stuff worked for me following my own instructions (this was RH 7.1 and the PG 7.1.3 RPMs); that is, make sure the initdb (done the first time you start postgresql and there is no pre-existing database) happens under LANG=C. Are you sure when you tried it you didn't have an existing older PG database present?

However, I now seem to have a more interesting issue... after installing PG 7.1.3, aolserver-3.3.1+ad13 and aolserver-nsxml and libxml2 2.4.1 and libxslt 1.0.1 (all from RPMs), I:

  1. Update openacs4 from CVS, then copy the files over to /var/lib/aolserver/servers/openacs4. rm -rf packages/acs-subsite since it isn't 'PORTED' yet, and seems to give an error about a missing file if you leave it in there.
  2. Edit /etc/aolserver/nsd.tcl to use a PG db and username of openacs4, and to expect the pageroot to be .../openacs4 rather than .../defaultacs.
  3. Fire up aolserver, point a browser at port 8000.
  4. Edit one more time and reboot to get rid of a warning about teh Fancy parser, though actually it *was* enabled, just in a different way from the way the installer seems to check... convert siggested .ini settings to .tcl (incidentally, shouldn't the installer offer both, or offer the .tcl way instead of .ini??), edit, restart aolserver.
  5. No warnings this time, select Next, ACS kernel installed fine, select Next again.
  6. Lost more installing happens, all looks fine, then it says it is 'Completing install sequence'. A few more commands echo, then nothing happens.

The log shows a referential integrity issue, followed by a problem writing a NULL into a NOT NULL field:

[24/Aug/2001:22:19:57][13925.4101][-conn0-] Error: Error sourcing /var/lib/aolserver/servers/openacs4/packages/acs-bootstrap-installer/installer/packages-install.tcl:
psql:acs-install.sql:196: ERROR:  apm_packages_package_key_fk referential integrity violation - key referenced from apm_packages not found in apm_package_types
psql:acs-install.sql:203: ERROR:  ExecAppend: Fail to add null value in not null attribute object_id

    invoked from within
"db_source_sql_file -callback apm_ns_write_callback acs-install.sql"
    invoked from within
"if { ![ad_acs_admin_node] } {
    ns_write "  <p><li> Completing Install sequence.<p>
    <blockquote><pre>"
    cd [file join [acs_root_dir] packages..."
    (file "/var/lib/aolserver/servers/openacs4/packages/acs-bootstrap-installer/installer/packages-install.tcl" line 34)
    invoked from within
"source $__file "

Repeated visits to http://myhostname:8000 just get me the same thing, I can't do anything useful with the installation at all.

Collapse
Posted by Don Baccus on
Well, you're going to need acs-subsite even though it's not quite completely ported yet (Ben's hacking on it this week and is just about
there).

What file does it complain is missing?

I've been doing steady reinstalls from my system but it has been last weekend since I've done it, so it is possible we've busted something.

There's a known issue at the moment regarding an circular dependency: the CR needs workflow, workflow needs acs-mail, and acs-mail needs the
CR. This resulted from our getting rid of acs-notifications and folding its limited functionality into acs-mail.

Dan's fixing that in the very near future but at the moment installs are going to be broken if you download the tree.  Sorry 'bout that.

Found the problem I was seeing... acs-subsite.info refers to a couple of .sql files which do not exist in the CVS tree for PG, only for Oracle.

I'll email the acs-subsite porter. Meanwhile, the workaround is to delete or comment out the two lines:

            <file type="query_file" db_type="postgresql" path="www/admin/group-types/change-join-policy-2-postgresql.xql"/>
            <file type="query_file" db_type="postgresql" path="www/admin/group-types/change-join-policy-postgresql.xql"/>

which are lines 138 and 142 of the packages/acs-subsite/acs-subsite.info file.

Collapse
Posted by Steve Woodcock on
Doh! That was me. Sorry. Forgot to recreate the .info. Will fix in a minute.
Collapse
Posted by Jonathan Marsden on
Don,

Can you describe the effect of this circular dependency issue on an install?

I'm trying to get acs-bootstrap-installer/installer/auto-install.tcl working and closer to doing what the interactive installer does... now I'm not sure if I'm seeing problems of my own making, or the circular dependency thing you just warned us about.
BTW it looks like auto-install.tcl was never ported (no .xql file) -- is that correct?

Collapse
Posted by Don Baccus on
No, autoinstall wasn't ported, when my current load of client work hit
I pretty much pulled back to doing nothing but trying to organize the porting effort.  I've written very little code (for OpenACS, that is)
since getting back from my month-long sojurn in Canada.

As far as the results of the circular dependency, I'm not certain at the moment because it is being worked on.  The first symptom was actually a failure on the part of the CR to load because it referenced
the non-existent acs-notifications package.  Vinod fixed that and ran into the circular dependency problem but I don't know how much of that
got committed and how much of it is sitting waiting for a cure before getting committed.

Collapse
Posted by Dan Wickstrom on
I haven't re-checked the loading yet, but the circular dependency on the CR has been removed.  I'll probably take a look at the loading tomorrow after I get some more work done on cms.
Collapse
Posted by Don Baccus on
I expect to spend time on OpenACS tomorrow, too ...
Collapse
Posted by Jonathan Marsden on
Don, OK.  I have the same results from using the interactive installer as my patched version of auto-install.tcl now.

Problem is, in both cases browsing to /register causes instant death of AOLserver, with no clear indication from logs as to why.  This also happens during attempts to install some (most?) non-core packages using /acs-admin/apm/packages-install.

Once I can install OpenACS4 and then install bboard (for example) and configure it and post a message or two to a bboard, I'll declare my auto-install.tcl useful and post a patch with my changes.

I don't *think* I'm seeing circ. dep. issues, because all the packages you mentioned seem to install fine, no errors.

BTW I went back to a nightly tarball from 18 Aug. and the behaviour is the same as with current (12 hours ago) CVS.  But the same installation of PG and AOLserver will run OpenACS 3.2.5 quite happily.

I'm told I may have to break out a debugger and attach it to nsd? But it seems odd this only happens to me, not to everyone else!?

Collapse
Posted by Don Baccus on
Hmmm...make sure you're setting a large stacksize for your AOLserver threads.  I've run out of stack space running OACS 4 with a hacked-up 3.2.5 nsd.tcl init file which didn't have a large stacksize defined.  That caused stacktraces not crashes (i.e. AOLserver didn't die it just
killed my thread) but it's worth your time to check nsd.tcl...

Other than that I don't have any brilliant ideas.  After all, our app code shouldn't be able to crash AOLserver.  Which version are you using?

Collapse
Posted by Jonathan Marsden on
Stack size... hmmmm.  I'm using a hacked nsd.tcl from my 3.2.5 RPMs... so that could be it.

I'm running PG 7.1.3, AOLserver 3.3.1+ad13, and the PG driver from CVS as of 25 April 2001.  All from my RPMs.

Yes! That fixed it.  Thanks!

Maybe the installer could/should check that you have a larger-than-default stacksize setting?

BTW I see you checked in a change to the PG driver a couple of days ago, making some buffers 100x bigger... is that just playing safe, or does OACS4 overflow the smaller buffers in the earlier code?

Collapse
Posted by Don Baccus on
Ahhh...good.  I'll look into checking for that in the installer, good idea.

I've got another set of changes to make to the driver then I'll announce that folks should upgrade (I'll escape backslashes in the bind variable emulation code, which will cause strings inserted into PG that contain backslashes to act as though PG were SQL92-compliant rather than a C clone in this case).

The query buffers for the insertion of pseudo large-objects (the so-called driver BLOB hack) were far too small.  I'd sized them for the select queries and boneheadedly didn't resize them when copying the declarations for the inserts.  This was causing the driver to crash AOLserver under IRIX - if you recall that thread the person trying to get it working enganged me in an ongoing e-mail dialogue in which between us we finally figured out the problem.

It's just luck that it's not crashing in other environments.

Collapse
Posted by Don Baccus on
OK, I had to fiddle dependencies a bit but the Oracle and Postgres versions both install correctly now from current CVS sources.

Thanks for your patience, folks ...