Forum OpenACS Q&A: nsthread failing - aolserver wont start

All of a sudden aolserver will not start on FreeBSD - the following appears as the last line in the log:

nsthread(48293) error: pthread_cond_timedwait failed in Ns_CondTimedWait: Invalid argument

Anyone have any idea what is causing this?

Thanks,

Collapse
Posted by Marc Fournier on
Just to add a bit to this ... the OS is:

4.5-STABLE FreeBSD 4.5-STABLE #6: Mon Mar 25 21:01:05 CST 2002

aolserver that this appears to be affecting is both 3.4 and 3.4.2 from FreeBSD ports ...

Collapse
Posted by Marc Fournier on
Further to this ... the tail end of server.log shows:

[06/Apr/2002:19:36:37][7562.139301888][-sched:52-] Notice: Querying 'end transaction;'
[06/Apr/2002:19:36:37][7562.139301888][-sched:52-] Notice: Ns_PgExec: Committing transaction
[06/Apr/2002:19:36:37][7562.139301888][-sched:52-] Notice: dbinit: sql(db.nyckidsarts.org:5440:nyckidsarts_org): 'end transaction'
[06/Apr/2002:19:36:37][7562.135589888][-sched-] Notice: Done running scheduled proc ug_init_serve_group_pages.
nsthread(7562) error: pthread_cond_timedwait failed in Ns_CondTimedWait: Invalid argument

I've moved /usr/local/aolserver out of the way and re-built/installed it, as well as pgdriver ...

software is aolserver 3.4.2, pgdriver 2.0 and pgsql 7.2 ...

neat thing is, looking through the code for aolserver, I can't even find an 'Ns_CondTimedWait' function, altho I can find its prototype in one of the .h files ...

Collapse
Posted by David Walker on
look in thread/cthread.cpp
Collapse
Posted by Marc Fournier on
yup, searched through it and finally found it ... but nothing there to indicate a problem ...

I have 13 aolserver/openacs installations on my machine, out of which 4 won't start with that self-same error ...

out of those 13, only 2 *aren't* at the same level of libraries, but the other 11 are sharing the same libraries ...

two out of the 4 not running are using AOLServer 3.4.2, 2 3.4 ...

only commonality between the 4 is that they are all running openacs 3.2.5 ...

so, out of 11 that are running with the new libraries (mailto:7@3.2.5 and mailto:4@4.x), 4 of the 3.2.5 ones won't start up, and the other 3 start up just fine ...

Collapse
Posted by David Walker on
Looks to me like this condition could be caused by your startup taking longer
than startuptimeout which defaults to 20 (seconds) ( I couldn't find it in the
documentation.  I had to backtrack from pthread_cond_timedwait to find it.)

In your nsd.tcl file add a parameter "ns_param startuptimeout 100" to
ns_section "ns/parameters".

Your startup may be taking longer because of the Web Robots database.  The
new address is
ns_param WebRobotsDB http://www.robotstxt.org/wc/active/all.txt

Collapse
Posted by Marc Fournier on
Perfect, thanks ... all 4 down servers now appear to be up again without any changes on my part ... I'm suspecting the 'slow load' does account for it, will have to check load avg if/when it happens next :(
Collapse
Posted by David Walker on
My 3.3.1+ad13 on Linux doesn't seem to timeout on startup.  Very
interesting.
Collapse
Posted by Marc Fournier on
Try running

jupiter# ps aux | wc -l
    2733

processes on it and come back to compare results? :)

Collapse
Posted by Ebenezer Botelho on
I had this problem last week and I noticed that i was given a negative time at struct timespec parameter. Are u sure that you are given a positive time?
hugs.