Forum OpenACS Q&A: Re: Fatal: received fatal signal 11 - new error after years!

Ryan,

was the last request from "2012:18:57:22 -0400" also on conn:1? it would help, if you could report the last view error log entries, including previous error.log entries from conn:1.

Thread start and thread end are not unusual places where nsd might crash. E.g. during thread cleanup, all thread specific resources are freed, so any memory corruption will lead to a crash there.

What version of aolserver + tcl + libthread are you using? If you have xotcl-core installed, the quickest way is to check http://YOURSERVER/xotcl/version-numbers

Did you compile Tcl + aolserver + modules yourself? What platform? What version of gcc? What optimization flags?

Background: we experienced many problems under load (more than 800 concurrent users) with gcc 4.1.2 on POWER6+. Most of these were in the thread-local-storage management of tcl 8.5.*, where one sees e.g. crashes during regepx, when the internal representation of regular expressions is kept in thread-local-storage. i rewrote some of these parts in tcl to use a simpler platform/compiler specific implementations of thread local storage, then the problem move to other places in the tcl implementation, also related with TLS. We were never able to produce simple test cases to trigger this crash. Interestingly enough, we never experienced the problem with the exactly same code base on intel, not even with 3000+ concurrent users. ... The message is, the platform/compiler/optimization flags might matter.

Did you get a core dump from the crash? if so, to find the problem space, use "gdb /SOMEPATH/nsd core.XXXX" and then type "bt" in gdb to see where the crash happened.

-gustaf neumann

Ryan ,

Just a small note: In order to have the nsd process generate a core dump just be sure to increase the maximun size of core files the user running the nsd process can create ( usually you want to have an unlimited size, just run 'ulimit -c unlimited' before you start the nsd process, you can add that temporarily to your start up script or so).

Then the core dump will be written to the Aolserver home directory ( e.g. /usr/local/aolserver/ ). Also you have to be sure that the user running the process has the correct permissions to write into that directory otherwise it wont be able to generate the core dump.

Also, in case you have nasty code that changes directories for whatever reason, it can happen that the core dump will be written to a different directory other than the Aolserver home directory, so just be aware of that.

Best,