Forum OpenACS Development: Re: Fatal Signal 11 Upon Startup of Naviserver on OS X

Collapse
Posted by Gustaf Neumann on
Hi Dave,

Can it be that the config file includes a .so file from a different release? Do you have a chance to run lldb/gdb on that machine? Like in the following command, just use the configfule you are using;

% sudo lldb /usr/local/ns/bin/nsd 
Current executable set to '/usr/local/ns/bin/nsd' (x86_64).
(lldb) run -t /usr/local/ns/oacs-5-9.tcl -u nsadmin -g nsadmin -f
when it crashes, type "bt" (for backtrace) to see, where it happens.
Collapse
Posted by Dave Bauer on
Thanks for the help.

Here it what it says.

(lldb) bt
* thread #7: tid = 0x162d675, 0x00007fff9c88b682 libsystem_c.dylib`strcasecmp_l + 89, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
* frame #0: 0x00007fff9c88b682 libsystem_c.dylib`strcasecmp_l + 89
frame #1: 0x000000010002b450 libnsd.dylib`Ns_SetFindCmp + 66
frame #2: 0x000000010000e011 libnsd.dylib`ConfigGet + 63
frame #3: 0x000000010000e39b libnsd.dylib`Ns_ConfigIntRange + 106
frame #4: 0x00000001000259cd libnsd.dylib`NsConnThread + 254
frame #5: 0x0000000100082976 libnsthread.dylib`NsThreadMain + 130
frame #6: 0x000000010008338a libnsthread.dylib`ThreadMain + 9
frame #7: 0x00007fff9f27bc13 libsystem_pthread.dylib`_pthread_body + 131
frame #8: 0x00007fff9f27bb90 libsystem_pthread.dylib`_pthread_start + 168
frame #9: 0x00007fff9f279375 libsystem_pthread.dylib`thread_start + 13

Collapse
Posted by Dave Bauer on
There was more here is the entire result:

[14/Jan/2016:18:01:02][97189.7fff7ddd2000][-main-] Notice: driver: starting: nssock
Process 97189 stopped
* thread #7: tid = 0x162d675, 0x00007fff9c88b682 libsystem_c.dylib`strcasecmp_l + 89, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x00007fff9c88b682 libsystem_c.dylib`strcasecmp_l + 89
libsystem_c.dylib`strcasecmp_l:
-> 0x7fff9c88b682 <+89>: movzbl (%r15), %edi
0x7fff9c88b686 <+93>: testb %dil, %dil
0x7fff9c88b689 <+96>: js 0x7fff9c88b695 ; <+108>
0x7fff9c88b68b <+98>: movl 0x43c(%r13,%rdi,4), %eax
(lldb) bt
* thread #7: tid = 0x162d675, 0x00007fff9c88b682 libsystem_c.dylib`strcasecmp_l + 89, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
* frame #0: 0x00007fff9c88b682 libsystem_c.dylib`strcasecmp_l + 89
frame #1: 0x000000010002b450 libnsd.dylib`Ns_SetFindCmp + 66
frame #2: 0x000000010000e011 libnsd.dylib`ConfigGet + 63
frame #3: 0x000000010000e39b libnsd.dylib`Ns_ConfigIntRange + 106
frame #4: 0x00000001000259cd libnsd.dylib`NsConnThread + 254
frame #5: 0x0000000100082976 libnsthread.dylib`NsThreadMain + 130
frame #6: 0x000000010008338a libnsthread.dylib`ThreadMain + 9
frame #7: 0x00007fff9f27bc13 libsystem_pthread.dylib`_pthread_body + 131
frame #8: 0x00007fff9f27bb90 libsystem_pthread.dylib`_pthread_start + 168
frame #9: 0x00007fff9f279375 libsystem_pthread.dylib`thread_start + 13

Collapse
Posted by Avni Khatri on
Dave found a fix for this. If we uncomment out the connsperthread parameter in the nsd config file (e.g. ns_param connsperthread 1000), that seems to fix the issue. Will test on other machines tonight.

Avni

Collapse
Posted by Gustaf Neumann on
Setting "connsperthread" in the config file is not necessary, the default is a reasonable value. However, the crash is very strange, i use naviserver since many years on a daily basis on Mac Mac OS X notebook. Can you send me the config file causing the crash? What are the versions of Mac OS X and naviserver where this happens?
Collapse
Posted by Gustaf Neumann on
I have a suspicion: the ns_set used for the configure variables is implemented mutex-lock-free, developed probably under the assumption that only the main thead reads modifies config data during startup. Since the "connsperthread" is read/written back during the startup there can be a race condition.

This would explain the situation, since it depends on the configuration, speed of the machine, .... i will look into this deeper, probably at the weekend.

Collapse
Posted by Gustaf Neumann on
I've looked deeper into this issue. I am pretty sure that there is/was a race condition in the code. The only occurrence of this type of problem is indeed the setting of "connsperthread", therefore leaving the default value for this parameter solves this issue.

The issue is already part of NaviServer 4.99.0 (released July 2005). There is now a fix for this in the tip version in bitbucket, which will be included in the next release of NaviServer.

Collapse
Posted by Dave Bauer on
Thanks, Gustaf, for all your help.