Forum OpenACS Development: If your installation is failing at acs-content-repository...

...then you may have the wrong collating order set in your postgres installation. Try this query:

select deci, ascii(code) from tree_encodings order by code limit 5;

The results should look like this:

 deci | ascii
------+-------
    0 |    48
    1 |    49
    2 |    50
    3 |    51
    4 |    52

If they don't, you need to do an initdb with LANG=C. See this thread on the PG mailing list. Having the collation order wrong causes the tree support to fail; acs-content-repository not installing is a side effect of this.

The bigger question is do we want to require a specific collation order to run OpenACS? Especially as it's a pain to change if you've got it wrong.

It would be nice not to be dependent on this, how hard would it be to get rid of it?  Did you take a look?  Dan wrote this so he may have some ideas.

If we need to maintain the dependency then we should make sure it's clearly documented.  I might be able to check at bootstrap/install time just like I check now for outer joins ...

It works if you have just 0-9A-Z in tree_encodings (deci 0..35). This will tend to make your tree_sortkeys longer of course.

If we keep it as is, a dependency check would be a good idea. A simple select 'a' < 'A' might do the trick.

I think the problem that steve is talking about is related to a pg bug where the coallating order is wrong for locales other than 'C'.  This should get fixed at some point.  However, it should be possible to dynamically generate the tree_encodings table so that the decimal code matches the collating order of the machine that it iss running on when the core is first loaded.

I'll see if I can fix the initialization of the tree_encodings table, so that the locale is not an issue.

Yes, I had a go at generating the tree_encodings dynamically but still had problems (I think because comparisons can be made between multiple characters? (see the bottom of another PG mailing list msg - "Btw., actually the rules are more complicated: ab < Ab < ac".))

It was getting late though, and if anyone can do it it's you Dan!

Ok, a little investigation on locale support turned up an ugly fact. Here is quote from the documentation on locale:

"The only severe drawback of using the locale support in PostgreSQL is its speed. So use locale only if you actually need it. It should be noted in particular that selecting a non-C locale disables index optimizations for LIKE and ~ operators, which can make a huge difference in the speed of searches that use those operators."

This a big problem for openacs, as we rely on index optimizations of 'like' for tree queries. Since the acs_objects table is quaranteed to get quite large in a normal openacs installation, using a non-C locale could result in the system being unusable due to the lack of index support for tree queries.

I guess that clinches it. So, we need a bootstrap test, and I need to do an initdb...
Thanks for the research, Dan - clearly we need a bootstrap/install test to enforce LANG=C, I'll look into this.

Steve - can you do an initdb somewhere else and keep your current db around?  Then you could test my check for me when I add it ...

Interestingly, Lamar Owen's PG init scripts have been setting LC_ALL=C when starting up the PG server, and when doing an initial initdb, no matter what the 'real' locale of your machine may be.  At least for PG 7.1 and 7.1.2, I've not checked earlier RPMs.

I think that suggests someone else discovered this issue before we did, and came to the same conclusion -- just always use the C locale!
My suspicion is that this means only folks who install PG from tarballs would have had a chance to encounter this problem; those
installing and running PG from Lamar's RPMs are already "safe"?

This probably explains why only some OpenACS4 testers have noticed the problem.

You also have to config with "--enable-locale", which I think is off by default.

I don't see any easy way to check for this other than to simply do a "select 'a' < 'A'".  The lc_collate information is stored in pg_control and doesn't appear to be shadowed in a PG table or otherwise available from SQL.

There's a "contrib" program to dig the information out from pg_control but we don't want to require folks to compile such optional  utilities just to pass a bootstrap/install test and it would be unlikely to be included in distribution RPMs anyway.

I've e-mailed Tom Lane to see if there's a good way to do this other than the "select" hack ...

Hmm, I installed from the 7.1.2-4PGDG RPMs. I don't see the LC_ALL=C you're talking about -- do you mean in the /etc/rc.d/init.d/postgresql script?, and my postgres user has LANG=en_GB set. I have to manually initdb with LANG=C before it works.

Looks like there's a later set (-5PGDG) than the ones I installed, I wonder if these have been 'fixed'.

Sounds like the RPMs are distributed with --enable-locale, then.  The lc_collate value is derived from your LANG environment variable (you can override some of the specific locale stuff with other environment variables, too).

Where did the LANG=en_GB come from?  Your RH install?

I think I changed it myself (/etc/sysconfig/i18n) when I installed RH. The default was en_US or something... but the point is the RPMs I installed didn't override to LANG=C for the initdb.
OOPS!  My mistake, PG 7.1 RPMs set LANG=C, but 7.1.2-4PGDG does all sorts of fancy stuff trying to remember the initdb-time locale info, and to use that remembered locale when starting postmaster...

Looks like this change was done by mailto:teg@redhat.com, who commented:

# Handle locales slightly differently - always using "C" isn't a valid option

Hmmm.  Always using "C" was a nicely valid option for OpenACS... but since the official PG RPMs no longer do that, we can't rely on it.
So we'll need to go the 'test, and complain if it uses an interesting locale' approach?  Can the logic used in pg_controldata.c (basically reading and formatting info from the PG "control file") be recoded in TCL perhaps?

I've put in a check to compare 'a' with 'A' which complains if it's not in the expected order.  I've not committed this change yet, but will after I make a few more changes.