Forum OpenACS Q&A: nsd crash after "ns_returnredirect"

Hi,

I am having a repeatable nsd crash each time ns_returnredirect is used. Redirect actually works, but then server dies with a single message (debug enabled).

[09/Aug/2006:12:27:58][7817.1082550624][-conn:0-] Fatal: received fatal signal 11

NsTclReturnRedirectObjCmd, Ns_ConnReturnRedirect seems to finish OK, and if there is anything in ADP after ns_returnredirect call, I am getting that in a trace, too.

So apparently server dies not IN the call, but AFTER it.

Any suggestions?

Thanks,
~ Alex.

Collapse
Posted by Torben Brosten on
depending on the application, you might try increasing the aolserver stacksize in config.tcl
Collapse
Posted by Alex Andryushkin on
Thanks for suggestion -
Tried to set to 1M (8 times the default size), don't see any difference - keep crashing.

~ Alex.

Collapse
Posted by Torben Brosten on
Any chance the new page has content that references an http image from an https connection or visa versa?

for example, http://referred-to-page has content: <img src="https://..";?

Collapse
Posted by Alex Andryushkin on
Nothing like that. The URL does not make difference.

the initial page (index.adp) may be as simple as this:

< %
ns_returnredirect "http://server:8080/alive.html";
% >

and alive.html only has a single "alive" word in it.

I also shall say that this issue seems to be specific for 64-bit version of nsd.

Collapse
Posted by Torben Brosten on
You're using aolserver 4.0.10, right?

Are there any warnings when the server starts?

Collapse
Posted by Gustaf Neumann on
You are sure, you have no mixup with different tcl versions on your system? use always --with-tcl.

The mistake is on your side, we have compiled the whole suite for 64 bit opterons and power5+ (our production environment) an the environment works very stable. if you want that someone helps you with the problem, you have to provide more details about what might be special in your setup.

Collapse
Posted by Alex Andryushkin on
Hi,

It's 4.5.0
No warnings at all ... just "signal 11"

Thanks

Collapse
Posted by Alex Andryushkin on
Hi, Gustaf

Quite possible ...

however this is how I compiled it:

#./configure --enable-threads --enable-symbols --prefix=/usr/lib/aolserver4/ --exec_prefix=/usr/lib/aolserver4/ --with-tcl=/usr/lib/tcl8.4
#make
#make install

The troubling question for me is that server seems to have no other issues except "redirect".
Could you suggest any better method to debug this problem?

Thanks

Collapse
Posted by Alex Andryushkin on
Ok, looks like that no one has a quick answer so I had to make time and play with the gdb and source code a little. Here is my findings:

file nsd/adpeval.c, function NsAdpLogError(NsInterp *itPtr), around line 824

At some point, framePtr which is
framePtr = itPtr->adp.framePtr;
turns out to be zero, which obviously causes SIGSEGV in the code couple lines later

Ns_DStringPrintf(&ds, "\n at line %d of ",
framePtr->line + interp->errorLine);

Why framePtr is zero - I have no idea, would be great to hear from nsd developers.

However, this quickie fixes the *crash* part

if (!framePtr) {
Ns_Log(Fatal, "AAA - zero framePtr");
return;
}

Please comment

~ Alex.

PS this is 4.5.0, not 4.0.10

Collapse
Posted by Vinod Kurup on
Hi Alex,

I haven't tried 4.5 yet, but someone has reported the same problem that you're having on the AOLserver list.

http://comments.gmane.org/gmane.comp.web.aolserver/13405

You may want to report your findings there.