Forum OpenACS Q&A: Response to Profiling Tcl with the TclLib profiler package

Inspired by Vadim's efforts, I also tried to get the Tcllib profiler working to my satisfaction in AOLserver. I failed. Here's what I learned:

First of all, I grabbed the tcllib-1-3-0 tagged release of tcllib from SourceForge:

$ cvs -d:pserver:anonymous@cvs.tcllib.sourceforge.net:/cvsroot/tcllib login
$ cvs -z3 -d:pserver:anonymous@cvs.tcllib.sourceforge.net:/cvsroot/tcllib co -r tcllib-1-3-0 tcllib
All of Tcllib is pure Tcl (no C modules), and tcllib/modules/profiler/profiler.tcl doesn't use anything else from the rest of Tcllib, so to install the profiler code into AOLserver the simplest thing to do is to copy profiler.tcl into your AOLserver tcl/ library directory. That's what I did - as long as your AOLserver is using Tcl 8.3 it should be fine.

In my case, I'm running just AOLserver, no OpenACS, but I'm using ad_page_contract, the db_api, and a lot of other ACS Tcl code.

I experimented with both calling ::profiler::init at various points during AOLserver startup, and also with calling it only on a particular www/test.tcl page. As you'd expect, if you define proc foo_1 before you call ::profiler::init, foo_1 will be completely invisible to the profiler. So to be of any use to me, I need to call ::profiler::init sometime during AOLserver startup, so that I can profile my Tcl procedures.

I first tried to let profiler instrument everything, by having AOLserver source the profiler.tcl library file very early in the startup process (immediately after 0-acs-init.tcl, in my case), and then run ::profiler::init like so:

set tcllib_profiler_p [ns_config -bool "ns/server/[ns_info server]/module/mystuff" tcllib_profiler_p]
if { $tcllib_profiler_p } {
   ns_log Notice "Initializing tcllib profiler, calling ::profiler::init now."
   interp alias {} profiler_dump {} ::profiler::dump
   ::profiler::init
}

That didn't work very well, it broke ad_page_contract and lots of other stuff in weird ways. I didn't track down exactly what broke and what didn't, but for example, ad_page_contract would run without errors, but incoming variables wouldn't be set, and strange stuff like that.

So I changed the filenames in my AOLserver tcl/ around so that profiler would only instrument my particular application code - my filenames happened to look like this:

0-acs-init.tcl
00-proc-procs.tcl
10-database-procs.tcl
99-noacs-procs.tcl
995-defs-procs.tcl
995-deprecated-utilities-procs.tcl
995-memoize-procs.tcl
995-tcl-documentation-procs.tcl
995-utilities-procs.tcl
995-watchdog-procs.tcl
99m-profiler.tcl
99n-profiler-init.tcl
mystuff-procs
zz-mystuff-init.tcl
zz-watchdog-init.tcl

That really didn't work - AOLserver started segfaulting in some of my code, code that works with no problems as long as profiler is turned off. Interestingly (?), that same segfault appears to show different stacktraces when I run it under gdb vs. Purify:

gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 7]
0x113e4c in CallTraces (iPtr=0x189d18, arrayPtr=0x0, varPtr=0xfdbfbb70, 
    part1=0x1 <Address 0x1 out of bounds>, part2=0x0, flags=32)
    at ./../generic/tclVar.c:4103
4103    ./../generic/tclVar.c: No such file or directory.

Purify:

COR: Fatal core dump
This is occurring while in thread 9:
      Tcl_FindCommand [tclNamesp.c:1983]
                 resPtr = iPtr->resolverPtr;
         
                 if (cxtNsPtr->cmdResProc) {
      =>             result = (*cxtNsPtr->cmdResProc)(interp,
      name,
                         (Tcl_Namespace *) cxtNsPtr, flags,
                         &cmd);
                 } else {
                     result = TCL_CONTINUE;
      SetCmdNameFromAny [tclExecute.c:5015]
      Tcl_GetCommandFromObj [tclExecute.c:4802]
      TclExecuteByteCode [tclExecute.c:761]
      Tcl_EvalObjEx  [tclBasic.c:2733]
      TclObjInterpProc [tclProc.c:1001]
Received signal 10 (SIGBUS - Bus Error)
Faulting address = 0x2
Signal mask: (SIGBUS) 
Pending signals:

And the segfaults weren't even the only strange behavior. For example, for debugging, sometimes I want to call ns_set print at certain places in my code. So, in my nsd.tcl file, I have an ns_param called debug_print_all_ns_sets_p, which I usually leave set to 0. Bizarrely, if I turn on profiler, my code started calling ns_set print even though debug_print_all_ns_sets_p was still set to 0 in my nsd.tcl.

A Tcl profiling tool for AOLserver would be very usefull, but even if I had the time, I've no idea how to proceed in debugging this. Any AOLserver internals gurus out there interested in looking into this?

Also, Roberto was just talking today about problems with AOLserver's handling of Tcl extensions, which I'd guess is related to these problems.