Forum OpenACS Q&A: Don't run ns_server on production sites; updated monitor.tcl

Yesterday while debugging a crashed AOLserver4, Dossy pointed out that we shouldn't be calling ns_server on a production site.  It turns out this call is not thread safe and can cause crashes.  See the NOTES section of this page:

http://www.panoptic.com/wiki/aolserver/653

I wasn't aware of this.  So it turns out that the monitoring page at /packages/acs-admin/www/monitor.tcl calls ns_server to display information about running threads.  Furthermore I'd made a habit of calling this page frequently after a restart, to be sure that connections were being processed in an orderly way and not piling up.  The act of checking for stability was causing crashes!

Anyway, I pulled that page off of production and things seem to be stable.

I've also rewritten monitor.tcl to use ns_info instead of ns_server.  Dossy indicated that this should be safe, and I've browsed the sources a bit to at verify that they are not using the same code to get the information (they aren't).

The rewritten version is committed to CVS on the oacs-5-1 branch.

There's one other place in standard CVS that calls ns_server -- an unused proc called ad_return_if_another_copy_is_running.  This needs to be fixed as well.

I put a comment in ad_return_if_another_copy_is_running to that effect. Thanks, Andrew!
ns_info is unstable too -- crashed our server :(

I checked in with Dossy and they found the same.  They're working on trying to find a stable, well-performing alternative.

I'll update this post, and the code, when I hear more.