Forum OpenACS Development: One week of using naviserver on openacs.org

Dear all, there is a short summary of the performance changes on openacs.org after moving from AOLserver 4.5.1/OpenACS 5.6/pg 8.3 to NaviServer/OpenACS 5.8/pg 9.2 based on a week of "normal" traffic (here are the details of the configuration).

Aggregated performance analysis

Performance last year (see year chart, column Avg): avg response time: 325ms/req
Performance last week (see year chart, column Avg): avg response time: 81ms/req

This means that under the same load, on the same machine, the new installation is about 4 times faster.


In the chart below, one can see clearly the performance increase in the drop of the response-time line in the month-graph in week 31.

Detail analysis of openacs.org

The detail analysis shows, that we have an average filter time (spent in the NaviServer filters for cookie management, site-node resolving of 3.6ms and for handling the page content of 67.5ms). 0.4% of the requests were queued, all other requests were directly assigned to connection threads. About 30% of the requests were spooled (i.e. sent via async delivery through the writer threads). Request-dispatch in new in NaviServer since 4.99.5, async delivery is a well not available in aolserver. The detailed timing information is as well a NaviServer feature.

Request Handling: requests 447683, queued 1796 (0.40%), spooled 141218 (31.54%)
Request Timing: avg queue time 0.0024s, avg filter time 0.0036s, avg run time 0.0676s

The performance of openacs.org is completely un-tuned, based just on the openacs version 5.8 from CVS head.

Detail analysis of dotlrn.org

Quite interesting is dotlrn.org, which is running as a virtual server under openacs.org (the same nsd process handles openacs.org and dotlrn.org; this configuration was not possible under aolserver). dotlrn.org serves essentially static content using the native content delivery mechanism of NaviServer (i.e. just fastpath, not using the openacs request handler). dotlrn.org recieves 15% of the requests of openacs.org. Based on the 63357 requests in 6 days, the average runtime per request is for dotlrn.org 0.2ms, that is on average 300 times faster than the runtime on openacs.org and 18 times faster than the filter time alone.

Request Handling: requests 63357, queued 10 (0.02%), spooled 6212 (9.80%)
Request Timing: avg queue time 0.0001s, avg filter time 0.0000s, avg run time 0.0002s

The performance improvements are partly due to (a) naviserver, (b) PostgreSQL 9.2 and (c) nsf/xotcl 2.0.

For reference, the machine has a "Intel(R) Xeon(R) CPU X5650 @ 2.67GHz" using Ubuntu 10.04 64bit and was bought in 2010.

Maybe, someone finds this interesting.
-gustaf neumann

Collapse
Posted by Brian Fenton on
Very interesting indeed! Thanks Gustaf. We really must have a look at this.
Collapse
Posted by Frank Bergmann on
Very interesting.
Our paying customers must have this 😊
Collapse
Posted by Gustaf Neumann on
Dear all,

Some updates: Over the last weeks, we worked on a cleanup of the code (mostly acs-core and the packages used on openacs.org), removing calls to deprecated functions and replacing old-style Tcl with the feature set available in Tcl 8.5. Furthermore, the actual code on openacs.org implements ad_proc based on nsf::proc when available. nsf::proc is part of nsf/XOTcl2 and provides a c-based argument parser as a replacement of the classical tcl-level parser in ad_proc.

Other changes address some infrequent very long calls stemming from (a) dns hangs, (b) sometimes extremely slow calculation of the summarizing function in the listing of the search results, and (c) from a broken entries in forums on openacs.org (one entry had a tree_sortkey of NULL, returning a "thread" of all postings. The last change was just a site bug.

As a consequence of these changes, the average run time of requests on openacs.org improved by about 30% since the posting above, namely from 0.0676 to 0.0472 seconds per request

requests 136473, queued 10 (0.01%), spooled 48751 (35.72%)
avg queue time 0.0001s, avg filter time 0.0032s, avg run time 0.0472s

The times reported by munin (moving average) are now in the range of 50ms down from 80ms in August and down from 325ms before (more than a factor of 6 faster). Outlines count more in the calculation of munin, since a slow call in a 5 min interval with few other requests counts more than e.g. in a period with frequent fast requests.

The improvements might be even better, since i was dropping in the setup of openacs.org some very fast calls (using large expire times for "resources"); these fast but unnecessary calls would improve the numbers further.

I am quite happy with these results. The frequency of the changes in the oacs-5-8 branch from my side will go down, since i am mostly through with the tcl-8.5 instruction changes (and the courses of the semester start).

Best regards
-gustaf neumann

Collapse
Posted by Gustaf Neumann on
Just to keep statistics about openacs.org on one place: on https://openacs.org/forums/message-view?message_id=4099229 we had:
- OpenACS.org runs the freshest code from the oacs-5-8 branch and uses not only acs-core, but as well several packages, in which we fixed many bugs not contained in the bugtracker. These changes led to much lower numbers of error entries in the error.log: 100 days ago (using aolserver + OpenACS 5.5 + pg 8.3.5) we saw abut 17.000 errors per day, last week, we were at 3000, yesterday at 428 (many of the remaining errors are from from invalid urls, etc.).
We are still improving on openacs.org: yesterday, we had e.g. 167 errors in the error.log. Most of these errors look like caused from broken web clients, sending requests with unresolved entities in hrefs (like eg. "..../?project_id=56106&f_action_20=8949&f_state=7" instead of "..../?project_id=56106&f_action_20=8949&f_state=7"). ad_page_contract complain in such cases about "nonstandard variable names".

On google's webmaster tools, we are down from 2966 errors (on Aug 6th) to 0 errors (since Sept 21th).

Collapse
Posted by Gustaf Neumann on
It is now nearly 5 months since we switched on OpenACS.org to NaviServer and pg 9.2. The following year-view shows how nicely and stable OpenACS.org runs with the newest snapshot of OpenACS 5.8.0 from cvs (oacs-5-8 branch), which will become OpenACS 5.8.1 in the future.

In Aug. and Sep. we did some more configuration and update work which are visible in the diagram (e.g. concerning the non-core packages used on this site). Since Oct. the response times are very stable, serving about the same load as one year earlier. Average response time of the server is below 50ms, although several fast requests were eliminated (e.g. by setting expiration).

all the best
-gn

Collapse
Posted by Jim Lynch on
Heya Gustaf,

Did you find out what was going on (looking at by-year graph) in the first week of July, where the graph looked like it was peaking?

-Jim

Collapse
Posted by Gustaf Neumann on
At least part of the problem was running openacs.org with OpenACS 5.6 and pg 8.4 for a short time. Since PostgreSQL 8.3 reached EOL in Feb 2013, we did the upgrade to 8.4 in July, hoping that the site is small enough and the machine is fast enough not to face the well known problem with the query optimizer changes in pg 8.4 (see e.g. https://openacs.org/forums/message-view?message_id=3804765). This bad performance was the final driver to push towards OpenACS 5.8.0 addressing these problems.