Forum OpenACS Q&A: Profiling Tcl with the TclLib profiler package

I spent a couple of days trying to get the TclLib profiler (http://tcllib.sourceforge.net/doc/profiler.html) working in ACS.

I was able to get it to work with Tcl pages in a bare-bones AOLServer instance with no (Open)?ACS.

It doesn't seem to mix very well with the way AOLServer embeds the Tcl interpreter.

To give you a quick idea of how/why the profiler could be useful, here's the kind of statistics that it collects:

::repeat
        callCount       6
        callerDist      GLOBAL 1 ::repeat 5
        compileTime     5283
        totalRuntime    44687
        averageRuntime  7447
        stddevRuntime   8272
        covpercentRuntime       111.1
        descendantTime  27913
        averageDescendantTime   4652
        descendants     ::repeat ::v_proc
::v_proc
        callCount       20
        callerDist      ::repeat 20
        compileTime     1026
        totalRuntime    7502
        averageRuntime  375
        stddevRuntime   166
        covpercentRuntime       44.3
        descendantTime  0
        averageDescendantTime   0
        descendants

For a more detailed description of how far I was able to progress, see the writeup that I put in file storage.

It would be great if someone knowledgeable about the AOLServer internals could take a look at this, if they decide that being able to profile Tcl code in OpenACS   might be a good thing.

Collapse
Posted by Don Baccus on
It would be a *great* thing.  AOLserver only supports TclPro profiling for Adp pages because they no longer recommend turning on the Tcl interpreter and internally they only use Tcl as embedded snippets in Adp pages.  Along with registered procs and maybe filters.

Some sort of profiling would be wonderful, integrated with acs-developer-support which already provides profiling information for every database query use to build a page.

Inspired by Vadim's efforts, I also tried to get the Tcllib profiler working to my satisfaction in AOLserver. I failed. Here's what I learned:

First of all, I grabbed the tcllib-1-3-0 tagged release of tcllib from SourceForge:

$ cvs -d:pserver:anonymous@cvs.tcllib.sourceforge.net:/cvsroot/tcllib login
$ cvs -z3 -d:pserver:anonymous@cvs.tcllib.sourceforge.net:/cvsroot/tcllib co -r tcllib-1-3-0 tcllib
All of Tcllib is pure Tcl (no C modules), and tcllib/modules/profiler/profiler.tcl doesn't use anything else from the rest of Tcllib, so to install the profiler code into AOLserver the simplest thing to do is to copy profiler.tcl into your AOLserver tcl/ library directory. That's what I did - as long as your AOLserver is using Tcl 8.3 it should be fine.

In my case, I'm running just AOLserver, no OpenACS, but I'm using ad_page_contract, the db_api, and a lot of other ACS Tcl code.

I experimented with both calling ::profiler::init at various points during AOLserver startup, and also with calling it only on a particular www/test.tcl page. As you'd expect, if you define proc foo_1 before you call ::profiler::init, foo_1 will be completely invisible to the profiler. So to be of any use to me, I need to call ::profiler::init sometime during AOLserver startup, so that I can profile my Tcl procedures.

I first tried to let profiler instrument everything, by having AOLserver source the profiler.tcl library file very early in the startup process (immediately after 0-acs-init.tcl, in my case), and then run ::profiler::init like so:

set tcllib_profiler_p [ns_config -bool "ns/server/[ns_info server]/module/mystuff" tcllib_profiler_p]
if { $tcllib_profiler_p } {
   ns_log Notice "Initializing tcllib profiler, calling ::profiler::init now."
   interp alias {} profiler_dump {} ::profiler::dump
   ::profiler::init
}

That didn't work very well, it broke ad_page_contract and lots of other stuff in weird ways. I didn't track down exactly what broke and what didn't, but for example, ad_page_contract would run without errors, but incoming variables wouldn't be set, and strange stuff like that.

So I changed the filenames in my AOLserver tcl/ around so that profiler would only instrument my particular application code - my filenames happened to look like this:

0-acs-init.tcl
00-proc-procs.tcl
10-database-procs.tcl
99-noacs-procs.tcl
995-defs-procs.tcl
995-deprecated-utilities-procs.tcl
995-memoize-procs.tcl
995-tcl-documentation-procs.tcl
995-utilities-procs.tcl
995-watchdog-procs.tcl
99m-profiler.tcl
99n-profiler-init.tcl
mystuff-procs
zz-mystuff-init.tcl
zz-watchdog-init.tcl

That really didn't work - AOLserver started segfaulting in some of my code, code that works with no problems as long as profiler is turned off. Interestingly (?), that same segfault appears to show different stacktraces when I run it under gdb vs. Purify:

gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 7]
0x113e4c in CallTraces (iPtr=0x189d18, arrayPtr=0x0, varPtr=0xfdbfbb70, 
    part1=0x1 <Address 0x1 out of bounds>, part2=0x0, flags=32)
    at ./../generic/tclVar.c:4103
4103    ./../generic/tclVar.c: No such file or directory.

Purify:

COR: Fatal core dump
This is occurring while in thread 9:
      Tcl_FindCommand [tclNamesp.c:1983]
                 resPtr = iPtr->resolverPtr;
         
                 if (cxtNsPtr->cmdResProc) {
      =>             result = (*cxtNsPtr->cmdResProc)(interp,
      name,
                         (Tcl_Namespace *) cxtNsPtr, flags,
                         &cmd);
                 } else {
                     result = TCL_CONTINUE;
      SetCmdNameFromAny [tclExecute.c:5015]
      Tcl_GetCommandFromObj [tclExecute.c:4802]
      TclExecuteByteCode [tclExecute.c:761]
      Tcl_EvalObjEx  [tclBasic.c:2733]
      TclObjInterpProc [tclProc.c:1001]
Received signal 10 (SIGBUS - Bus Error)
Faulting address = 0x2
Signal mask: (SIGBUS) 
Pending signals:

And the segfaults weren't even the only strange behavior. For example, for debugging, sometimes I want to call ns_set print at certain places in my code. So, in my nsd.tcl file, I have an ns_param called debug_print_all_ns_sets_p, which I usually leave set to 0. Bizarrely, if I turn on profiler, my code started calling ns_set print even though debug_print_all_ns_sets_p was still set to 0 in my nsd.tcl.

A Tcl profiling tool for AOLserver would be very usefull, but even if I had the time, I've no idea how to proceed in debugging this. Any AOLserver internals gurus out there interested in looking into this?

Also, Roberto was just talking today about problems with AOLserver's handling of Tcl extensions, which I'd guess is related to these problems.

Collapse
4: C-based TCL profiler (response to 1)
Posted by Eric Lorenzo on

I attempted to get the tcllib profiler working under AOLserver a while back, and failed miserably. I've forgotten the details, though. I'd be a little worried about the quality of the data collected by the tcllib profiler, anyways -- from what I remember, the actual profile data collection was written in TCL. I would imagine that this would add significant overhead to TCL execution, and would skew the resulting data somewhat.

Since I couldn't get the tcllib profiler working, and I wasn't sure that it would work very well, and had been getting the urge to do some C hacking, I wrote my own TCL profiler as an AOLserver C module. I'm actually in the middle of cleaning it up a bit at the moment, and will probably post it for download within the next couple of days.

I have very mixed feelings about the usefulness of this C-based profiler, though. For one thing, if I understand what I'm seeing in the TCL interpreter source code correctly, the mechanism I'm using to catch and log proc invocations prevents the interpreter from storing any compiled bytecode, which will surely skew the results somewhat. And I think that the database is still going to be the biggest bottleneck, most of the time.

Eric, your AOLserver profiler module sounds very usefull! I look forward to trying it out.
Collapse
6: AOLserver nsprofile module (response to 1)
Posted by Andrew Piskorski on
I've found two separate versions on SourceForge of Curtis Galloway's nsprofile module for AOLserver, one in the AOLserver project (CVS) and one in the ACS-Misc project (CVS). And, the exact ancestory of the code is unclear to me, but it sounds like Curtis originally based his nsprofile code on Mark Diekhans' code implementing the profile command in TclX.

Has anyone experimented with any of these profilers?

Collapse
Posted by Andrew Piskorski on
Well, I meant to post further information on nsprofile and Eric Lorenzo's new tclprof module, but never got around to it, so let me try to remedy that now: I used both nsprofile and tclprof briefly. I did not review the code of either in any detail, but I did take a quick look. Here's what I learned, and my thoughts for further work:

nsprofile can only instrument a single thread at a time, and it's data output formatting is very primitive. It will work for any one thread though, you just have to stick in explicit 'profile on' and 'profile off' Tcl commands. Tclprof, on the other hand, will automagically profile stuff in many different threads at once, and present the results nicely, but as the code stands today, it only works for connection threads.

In my case, I was specifically interested in profiling Tcl code in a single long-running detached thread. My needs were simple and very specific, so I just used nsprofile, since it supported my situation out of the box while tclprof did not.

However, tclprof appears to be more sophisticated than nsprofile, and is definitely better documented. Going forward, I would first compare the tclprof and nsprofile code to understand their strategies, if one codebase could benefit from the other, etc. I believe that at the time Eric wrote tclprof, he wasn't aware nsprofile existed, so AFAIK to date no one has done any such comparison.

Here are some specific comments on how I'd start improving tclprof:

First of all, the tclprof C functions should all be renamed to conform to AOLserver standards by prefixing them all with "Tp_" or something like that, but that's a very easy fix.

To make tclprof work for non-conn threads, first, ProfileTrace needs to be changed to not abort data capture when there is no Ns_Conn available. [I did that, easy.] Then, it looks like CleanupConnection is never run for non-conn threads, so the RequestTrace is never published into long-term storage, and thus www/index.tcl never shows anything for that thread. This is fundamentally why tclprof currently only works for conn threads. Since there's no logically usefull "end" to threads in general the way there is for a conn thread serving a particular page, the only obvious way I can think of to address that is to add tclprof start and stop Tcl commands, where 'tclprof stop' calls CleanupConnection. Then I could explicitly call 'tclprof start' and 'tclprof stop' in the Tcl code I want instrumented. [I didn't try that, but I think it should be straightforward.]

Eric mentioned that he was considering doing some magic so that ns_schedule threads would be automagically profiled, without you adding any sort of profiling "start" or "stop" commands into your scheduled jobs. That could be usefull, but I think it's mostly orthoganal to adding the "start" and "stop" Tcl commands that I suggest above. I would add the start and stop profiling commands first, which is simple and generally usefull, and only worry about more fancy automagical features later.

Also, one of Eric's design goals for tclprof seems to have been automagical profiling, where you just turn on the tclprof module, and everything (or most things) gets profiled without you changing a single line of your Tcl code - it does that for conn threads right now. This sort of automagical profiling sounds very cool, but it could also be very important to be able to turn off such automagical behavior, so you can use tclprof in a more targetted fashion, without impact on all the other running code that you're not interested in. For instance, imagine needing to run tclprof in a heavily loaded production server, in order to track down some specific performance problem that you can't easily replicate on your Dev server. If you already have a decent idea of where the problem code is, you probably don't want to take the (potentially very large) performance hit of running tclprof on everything.

Collapse
Posted by Dave Bauer on
Andrew sent me the tclprof code, and with one change, I compiled it for AOLserver 4. I still haven't figured out what to do with the numbers, but I think it might be a useful package.
Collapse
Posted by Jamie Rasmussen on
I also got the code from Andrew (hopefully someone will respond to his AOLserver mailing list question about importing it into CVS).  I'm going to be trying the module out on AOLserver 4 soon - what change did you need to make Dave?
Collapse
Posted by Jamie Rasmussen on
ah - was it changing Ns_TclGetConn(NULL) to Ns_TclGetConn(interp)?
Collapse
Posted by Dave Bauer on
Jamie,

THat's it.

I found that it can still crash AOLserver if you test a large number of requests in a short time. I did not have a chance to investigate why.

Dave