Forum OpenACS Development: ACS crashes on TCL bad code offset?

Collapse
Posted by Dan Lieberman on
Here's the last bit from our aolserver log before the server crashed:

[12/Mar/2003:07:30:06][31249.106508][-sched:6-] Notice: Running scheduled proc sec_sweep_sessions...
[12/Mar/2003:07:30:06][31249.106508][-sched:6-] Notice: Querying '
        delete from sec_sessions
        where  1047447006 - last_hit > 176800;'
[12/Mar/2003:07:30:06][31249.106508][-sched:6-] Notice: dbinit: sql(localhost::acs): '
        delete from sec_sessions
        where  1047447006 - last_hit > 176800
    '
[12/Mar/2003:07:30:06][31249.106508][-sched:6-] Notice: Done running scheduled proc sec_sweep_sessions.
GetCmdLocEncodingSize: bad code offset

Collapse
Posted by Don Baccus on
"TCL bad code offset" ...

This tells me one of two things have happened:

1. You've stumbled across a bug in the Tcl bytecode compiler or

2. Bad RAM, or a bad write to swap, or something similar corrupted the compiled byte code.

If it persists after restarting AOLserver then #1 is likely, if not, #2.

If #1 ... try the new AOLserver 4.0 beta3 with Tcl 8.4.2 to see if it has been fixed in the latest release.

Collapse
Posted by Dan Lieberman on
Bob
thanks. Well I guess its #2 - we restarted aolserver and in several days of intensive activity havent seen a relapse.
dL
Collapse
Posted by Dan Lieberman on
Ran dmesg on the machine and sure enough got an NMI.

Uhhuh. NMI received. Dazed and confused, but trying to continue
s0 kernel: You probably have a hardware problem with your RAM chips

This is an important lesson for anyone experiencing
instability with AOLserver/OACS - FIRST check the kernel logs.