Forum OpenACS Q&A: Re: nsd aborts when restarted after apparently successful openACS installation

I built aolserver from the cvs head on friday and did not encounter any problems. AOLServer4 does not require Tcl 8.4.2, it will work just fine with 8.4.1 which has been out since october last year (and probably with 8.4.0 as well but I don't know that first hand (and Tcl is quite mature and stable -- calling it bleeding edge is pretty absurd).

As for what aolserver can do that apache can't: for one, you can run OpenACS. Since that's what most of us here are focused on, it's reasonably important to us.

I don't want this to sound insulting but a lot of people have built aolserver from the cvs head without trouble. I guess it would be interesting to figure out where you went wrong but if you are too frustrated to bother that's fine as well. It would be nice to know what platform you are building for and what you gave to configure when you built tcl.

Well, yeah ok i admit i probably got a bit carried away there. By the looks of it the problem is a tcl problem anyway. the relocation error looks like my tcl is not linking properly.So I should probably be peeved with tcl, not aolserver.

I got tcl 8.4.2 which is supposed to be a stable release, and i did build it with enable threads, etc. The tcl 8.4 .2 did compile flawlessly and only failed one of its self-tests out of 8000 or so - it was some file handling test or other...

I also compiled tk8.4.2, even though aolserver does not require tk... and tk8.4.2 compiled fine but failed many of its tests... many worked, but there were a lot of failures.

As you say, tcl is supposed to be a mature and stable proudct as well, so i am disappointed with 8.4.2 so far... as the third release in the 8.4 series it should be ok, even if it was only released a few days ago. But i despise tcl anyway so I am not surprised. I will try 8.4.1 anyway.

I have lost the configure commands that I used to build tcl from my history, due to the demented way in which mulitple gnome-terminals mangle one's history, by reading it in when they start, and not writing it until they exit.. ie if you start 2 terminals, do a lot of commands in one, then exit it, it will write your commands into the history file. But if u then exit the 2nd terminal, it will overwrite the history with its version which has none of the commands you just did in the other terminal.... Anyway, I recall that the only additional option i used was to enable threads.

Anyway, the aolserver build actually seemed to go fine. I configured the build with:

./configure --prefix=/usr/local/progs/aolserver-cvs --with-tcl=/usr/local/tcl/tcl8.4.2/lib

I made /usr/local/progs/aolserver a link to aolserver-cvs.
Then i installed the nssha and nsxml modules, but no luck.

Regarding the AOLserver build, it was unclear to me whether it was still necessary to apply the patch for ns_uuencode to work with binary files and/or the patch to make aolserver work with the -g flag. The instructions on what to do with the patches from sourceforge are rather unclear... so i am hoping they are no longer needed with AOLserver4

Ill see how i go with tcl8.4.1 and report back.

also, re the platform, it is Mandrake 8.2 with many additional bells and whistles.

re the relocation error i did find this on the net which was interesting...

----------------------------------------------------------
Most likely Mandrake's python was not using the RTLD_GLOBAL hack that Red Hat Linux had.  If libimlib-jpeg.so needs a symbol called "_gdk_malloc_image", it needs to have a DT_NEEDED entry that says what library to get it from.  (i.e., it needs to include -lgdk on the link line)

As you can see:

[msw@sid msw]$ ldd /usr/lib/libimlib-jpeg.so
        libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0x4001a000)
        libc.so.6 => /lib/i686/libc.so.6 (0x42000000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000)

This would be OK if libgdk_imlib.so linked against libgdk, but:

[msw@sid 8.0]$ ldd /usr/lib/libgdk_imlib.so
        libSM.so.6 => /usr/X11R6/lib/libSM.so.6 (0x4003c000)
        libICE.so.6 => /usr/X11R6/lib/libICE.so.6 (0x40046000)
        libXext.so.6 => /usr/X11R6/lib/libXext.so.6 (0x4005d000)
        libglib-1.2.so.0 => /usr/lib/libglib-1.2.so.0 (0x4006b000)
        libc.so.6 => /lib/i686/libc.so.6 (0x42000000)
        libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x4008f000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000)
        libdl.so.2 => /lib/libdl.so.2 (0x4016c000)

So it's all totally broken.  The RIGHT thing to do is fix gdkimlib.  I
think it would be much easier to make your application use gdkpixbuf
which doesn't suffer these coding errors.
Cheers,
Matt
-----------------------------------------------------

So Mandrake have been known to make a mess of their linking from time to time.

But the output of ldd on my nsd is

: jss: 22:23:49 /usr/local/src/gui/tcl/tcl8.4.2/unix ; ldd /usr/local/progs/aolserver/bin/nsd
    /usr/local/tcl/tcl8.4.2/lib/libtcl8.4.so => /usr/local/tcl/tcl8.4.2/lib/libtcl8.4.so (0x40016000)
    libnsd.so => /usr/local/progs/aolserver-cvs/lib/libnsd.so (0x400b8000)
    libnsthread.so => /usr/local/progs/aolserver-cvs/lib/libnsthread.so (0x40105000)
    libdl.so.2 => /lib/libdl.so.2 (0x4011f000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x40122000)
    libm.so.6 => /lib/libm.so.6 (0x40138000)
    libc.so.6 => /lib/libc.so.6 (0x4015a000)
    /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

So I cannot see how there can be any problem finding TclCompileScript@tclByteCodeType, as tcl8.4.2.so DOES look like it has been linked in properly and I expect TclCompileScript should actually be in tcl8.4.so itself, and not require anything else to be linked in at run-time...

Maybe TclCompileScript has been left out of the library, though that seems very very unlikely to me or tcl would have failed many tests. I may try to find that symbol in tcl8.4.so to check on that though. Maybe its something to do with the "@tclByteCodeType" bit at the end.

argh.

I hope tcl8.4.1 works, because all i want to do is run AOLserver, not fix tcl8.4.2

John

sorry if it was a bit unclear but the reason i posted that bit re gdk_imlib etc becaus teh linking probs with that library were resulting in a relocation error which at first looked similar to mine:

--------------------------------------------------
On Fri, Jul 26, 2002 at 07:18:49AM -0600, Don Allingham wrote:
<blockquote> I have been fighting the problem for quite a while, and I cannot come up
with a workable solution. Under Mandrake 8.2, I keep getting the
following error:

/usr/bin/python: relocation error: /usr/lib/libimlib-jpeg.so: undefined symbol: _gdk_malloc_image

Things work fine under RedHat, SuSE, debian, any other distribution that
I or anyone else has tested. I can get around the problem by using
LD_PRELOAD, but this isn't a really good solution, as it is not very
portable.

export LD_PRELOAD='/usr/X11R6/lib/libX11.so /usr/lib/libgdk_imlib.so.1 /usr/lib/libgdk.so'

I've tried forcing a load of GdkImlib, but this doesn't seem to have an
effect. The first call to any image handling routine, either from python
or from libglade, causes python to abort.

</blockquote>
-------------------------------------------------

undfortunately in my case libtcl8.4.2.so should not need to link to anything else to get TclCompileScript anyway, all it shoudl need is:
: jss: 22:41:30 /usr/local/src/gui/tcl/tcl8.4.2/unix ; ldd /usr/local/tcl/tcl8.4.2/lib/libtcl8.4.so
    libdl.so.2 => /lib/libdl.so.2 (0x400b4000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x400b7000)
    libm.so.6 => /lib/libm.so.6 (0x400ce000)
    libc.so.6 => /lib/libc.so.6 (0x400f0000)
    /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000)

So it is still beyond me why TclCompileScript (ie TclCompileScript@tclByteCodeType) cannot be found in libtcl8.4.so itself. the LD_PRELOAD wont work in my case because there is nothing i can preload to define TclCompileScript before it is needed in libtcl8.4.2.so because it should be IN libtcl8.4.2.so !!

what does "ldd nsd" say? also
nm libtcl8.4.so  | egrep TclCompileScript\|tclByteCodeType
ldd /usr/local/progs/aolserver/bin/nsd
    /usr/local/tcl/tcl8.4.2/lib/libtcl8.4.so => /usr/local/tcl/tcl8.4.2/lib/libtcl8.4.so (0x40016000)
    libnsd.so => /usr/local/progs/aolserver-cvs/lib/libnsd.so (0x400b8000)
    libnsthread.so => /usr/local/progs/aolserver-cvs/lib/libnsthread.so (0x40105000)
    libdl.so.2 => /lib/libdl.so.2 (0x4011f000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x40122000)
    libm.so.6 => /lib/libm.so.6 (0x40138000)
    libc.so.6 => /lib/libc.so.6 (0x4015a000)
    /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

nm /usr/local/tcl/tcl8.4.2/lib/libtcl8.4.so | egrep TclCompileScript\|tclByteCodeType
0003ab40 T TclCompileScript
00099778 D tclByteCodeType

now i am really confused ... the damn thing is actually there...

what about "strace nsd 2>&1 | grep libtcl"?
i cut back my LD_LIBRARY_PATH to nothing to reduce the paths searched (as i only need LD... set for the visualisation toolkit) and the result was:

: root: 00:32:36 /root ; strace /usr/local/progs/aolserver/bin/nsd -t /home/nobody/web/vorpal/vorpal.tcl -u nobody -g web 2>&1 | grep -i libtcl
open("/usr/local/tcl/tcl8.4.2/lib/libtcl8.4.so", O_RDONLY) = 3
open("/usr/local/progs/aolserver-cvs/lib/libtcl8.4.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/tcl/tcl8.4.2/lib/i686/mmx/libtcl8.4.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/tcl/tcl8.4.2/lib/i686/libtcl8.4.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/tcl/tcl8.4.2/lib/mmx/libtcl8.4.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/tcl/tcl8.4.2/lib/libtcl8.4.so", O_RDONLY) = 3
: root: 00:33:36 /root ;

so it does find it in the end!

i just read something in the unix/README for tcl8.4.1 which has jogged my memory....

when i compiled my tcl8.4.2 i accidentally inserted a space after my --prefix= ie before the path that i wanted to install to (doh!). So it installed to / and i had to remove it manually. In the readme it says i should do a make distclean if i change any parameters to configure... and i did not do so before i recompiled with the right --prefix

so maybe that is the problem. Sorry to all if that does turn out to be the problem. After all my bitching it would of course be my own fault wouldnt it :)

well,

my humble apologies to all those whose time has been wasted trying to help me overcome the consequences of my own stupidity.

I recompiled tcl8.4.2 after doing a make distclean to clean up the mess caused by calling doing configure for tcl8.4.2 with:
--prefix= /there/should/be/no/space/b4/this/path

and then configuring again with:
--prefix=/much/better/with/no/space/b4/this/path

It still amazes me that this seems to have been the cause of all my problems... i mean, really, is it too much to expect that calling configure a 2nd time with new options should overwrite all products of previous configures with the new results, without having to do a make distclean???

oh well.

I also tried compiling AOLserver4 CVS against tcl8.4.1 and yeah, no problems there either.

A final comment is in order though. I still would not have a working AOLserver were it not for the fact that I was able to scavenge a copy of nsrewrite.so from the build I did of Matts AOLserver distribution. My CVS checkout of AOLserver4 did not come with any code for nsrewrite.so, and the nsrewrite module is NOT available at sorceforge with all the other modules for AOLserver.

Besides that the only very minor problem was that the sample openACS config file at:

https://openacs.org/doc/openacs-4/files/openacs4.tcl.txt

does not mention that openACS also needs AOLserver to load the nsdb module -- so it needs to have the line:

ns_param  nsdb            ${bindir}/nsdb.so

added at the appropriate place.

Sorry again to all those whose time was wasted especially those who took the time to try to help me.

john

The OpenACS config file hasn't been updated for AOLserver 4.0 as almost everyone here uses the OpenACS distribution of AOLserver, which forked from the AOL version a long time ago.  Merging the distributions is a recent effort, coinciding with work on the 4.0 branch which includes several other improvements.

nsrewrite isn't actually used by OpenACS so you can remove it from your config file if you want.  The module itself is in transition from OpenACS to SourceForge.  nsdb as a separate module is an AOLserver 4-ism.

I'm glad you found a solution to your problem!