Forum OpenACS Development: One week of using naviserver on openacs.org
Aggregated performance analysisPerformance last year (see year chart, column Avg): avg response time: 325ms/req
Performance last week (see year chart, column Avg): avg response time: 81ms/req
This means that under the same load, on the same machine, the new installation is about 4 times faster.
In the chart below, one can see clearly the performance increase in the drop of the response-time line in the month-graph in week 31.
Detail analysis of openacs.orgThe detail analysis shows, that we have an average filter time (spent in the NaviServer filters for cookie management, site-node resolving of 3.6ms and for handling the page content of 67.5ms). 0.4% of the requests were queued, all other requests were directly assigned to connection threads. About 30% of the requests were spooled (i.e. sent via async delivery through the writer threads). Request-dispatch in new in NaviServer since 4.99.5, async delivery is a well not available in aolserver. The detailed timing information is as well a NaviServer feature.
Request Handling: requests 447683, queued 1796 (0.40%), spooled 141218 (31.54%)
Request Timing: avg queue time 0.0024s, avg filter time 0.0036s, avg run time 0.0676s
The performance of openacs.org is completely un-tuned, based just on the openacs version 5.8 from CVS head.
Detail analysis of dotlrn.orgQuite interesting is dotlrn.org, which is running as a virtual server under openacs.org (the same nsd process handles openacs.org and dotlrn.org; this configuration was not possible under aolserver). dotlrn.org serves essentially static content using the native content delivery mechanism of NaviServer (i.e. just fastpath, not using the openacs request handler). dotlrn.org recieves 15% of the requests of openacs.org. Based on the 63357 requests in 6 days, the average runtime per request is for dotlrn.org 0.2ms, that is on average 300 times faster than the runtime on openacs.org and 18 times faster than the filter time alone.
Request Handling: requests 63357, queued 10 (0.02%), spooled 6212 (9.80%)
Request Timing: avg queue time 0.0001s, avg filter time 0.0000s, avg run time 0.0002s
The performance improvements are partly due to (a) naviserver, (b) PostgreSQL 9.2 and (c) nsf/xotcl 2.0.
For reference, the machine has a "Intel(R) Xeon(R) CPU X5650 @ 2.67GHz" using Ubuntu 10.04 64bit and was bought in 2010.
Maybe, someone finds this interesting.
Our paying customers must have this
Some updates: Over the last weeks, we worked on a cleanup of the code (mostly acs-core and the packages used on openacs.org), removing calls to deprecated functions and replacing old-style Tcl with the feature set available in Tcl 8.5. Furthermore, the actual code on openacs.org implements ad_proc based on nsf::proc when available. nsf::proc is part of nsf/XOTcl2 and provides a c-based argument parser as a replacement of the classical tcl-level parser in ad_proc.
Other changes address some infrequent very long calls stemming from (a) dns hangs, (b) sometimes extremely slow calculation of the summarizing function in the listing of the search results, and (c) from a broken entries in forums on openacs.org (one entry had a tree_sortkey of NULL, returning a "thread" of all postings. The last change was just a site bug.
As a consequence of these changes, the average run time of requests on openacs.org improved by about 30% since the posting above, namely from 0.0676 to 0.0472 seconds per request
requests 136473, queued 10 (0.01%), spooled 48751 (35.72%)
avg queue time 0.0001s, avg filter time 0.0032s, avg run time 0.0472s
The times reported by munin (moving average) are now in the range of 50ms down from 80ms in August and down from 325ms before (more than a factor of 6 faster). Outlines count more in the calculation of munin, since a slow call in a 5 min interval with few other requests counts more than e.g. in a period with frequent fast requests.
The improvements might be even better, since i was dropping in the setup of openacs.org some very fast calls (using large expire times for "resources"); these fast but unnecessary calls would improve the numbers further.
I am quite happy with these results. The frequency of the changes in the oacs-5-8 branch from my side will go down, since i am mostly through with the tcl-8.5 instruction changes (and the courses of the semester start).
- OpenACS.org runs the freshest code from the oacs-5-8 branch and uses not only acs-core, but as well several packages, in which we fixed many bugs not contained in the bugtracker. These changes led to much lower numbers of error entries in the error.log: 100 days ago (using aolserver + OpenACS 5.5 + pg 8.3.5) we saw abut 17.000 errors per day, last week, we were at 3000, yesterday at 428 (many of the remaining errors are from from invalid urls, etc.).We are still improving on openacs.org: yesterday, we had e.g. 167 errors in the error.log. Most of these errors look like caused from broken web clients, sending requests with unresolved entities in hrefs (like eg. "..../?project_id=56106&f_action_20=8949&f_state=7" instead of "..../?project_id=56106&f_action_20=8949&f_state=7"). ad_page_contract complain in such cases about "nonstandard variable names".
On google's webmaster tools, we are down from 2966 errors (on Aug 6th) to 0 errors (since Sept 21th).
In Aug. and Sep. we did some more configuration and update work which are visible in the diagram (e.g. concerning the non-core packages used on this site). Since Oct. the response times are very stable, serving about the same load as one year earlier. Average response time of the server is below 50ms, although several fast requests were eliminated (e.g. by setting expiration).
all the best
Did you find out what was going on (looking at by-year graph) in the first week of July, where the graph looked like it was peaking?