Forum OpenACS Development: Varnish HTTP Reverse Proxy for XoWiki on project-open.org - Experiences and Remaining Issues

Hi,

We're running XoWiki as part of our heavy traffic www.project-open.org site since about two years. The server gets some 300.000 Google "impressions" per month, some 50.000 unique visits per month and up to 100 hits per second.

However, Google Webmaster Tools recently complains about slow load times and we also had some overload issues, so we are looking for solutions without having to buy new hardware.

So I recently checked "Varnish Cache" (www.varnish-cache.org) and found the necessary features to work around OpenACS/AOLserver HTTP headers that prevent SQUID from working as a reverse proxy.

I haven't 100% understood the server yet, but I was able to create a working .vcl config file with selective cashing for XoWiki and some resources directories (95% of all HTTP requests) while maintaining the interactive behavior of the rest of the system.

However, Varnish still doesn't seem to cache the home page and other pages in the root directory (/www in AOLserver). Does anybody made experiences with this?

The first tests I made with the configuration showed a factor 10x speedup in content delivery (~0.04 seconds with VarnishD vs. 0.4 seconds directly from AOLserver), and I believe the overall effect will even be higher during moments of high traffic.

Here is the config file I used:

backend default {
  .host = "127.0.0.1";
  .port = "8000";
}

sub vcl_recv { # doesn't work! if (!(req.url ~ "/")) { return (lookup); } if (req.url ~ "^/xowiki/") { return (lookup); } if (req.url ~ "^/images/") { return (lookup); } if (req.url ~ "^/resources/") { return (lookup); } if (req.url ~ "^/intranet/images/") { return (lookup); } if (req.url ~ "^/intranet/style/") { return (lookup); } if (req.url ~ "^/intranet/css/") { return (lookup); } if (req.url ~ "^/intranet/js/") { return (lookup); } }

sub vcl_fetch { return (deliver); }

So I'd be very interested to learn about other experiences using Varnish...

Cheers,
Frank

<blockquote> The server gets some 300.000 Google "impressions" per month
</blockquote>

This sounds like a strange kind of metric. What do you mean by this? Hits by the google bot? 
E.g. OpenACS.org gets about 670.000 google hits per month, and has rather moderate traffic.

<blockquote> ...  found the necessary features to work around OpenACS/AOLserver HTTP headers that prevent SQUID from working as a reverse proxy
</blockquote>

What does this mean? We use OpenACS on most sites behind a reverse proxy (nginx).

<blockquote> ... factor 10x speedup in content delivery (~0.04 seconds with VarnishD vs. 0.4 seconds directly from AOLserver)
</blockquote>

If your AOLserver takes 0.4 seconds for delivering files, something is really broken in your installation. 
Already many years ago, naviserver reached easily >1200 requests per seconds (these are 0.00083 secs per request), see e.g.:
http://www.mail-archive.com/naviserver-devel@lists.sourceforge.net/msg02038.html

On todays hardware the performance will be certainly much better. If we look at the traffic of our production site, 
and measure just the dynamic pages (no images, css, .js included, which is much faster), the average response time
is between 0.06 and 0.09 seconds on a four year old hardware, with currently more than 2500 users active, 
<blockquote>12000 currently logged in). Note, that delivering a dynamic page involves typically many SQL queries (often 30+); 
</blockquote>
Comparing dynamic pages with the delivery of a cached page without permission checking etc. is comparing apples with pears, 
... but maybe, the application running within aolserver is slow.

Are you running your server in a virtual machine? Maybe the configuration of this machine is in a bad state...

-gustaf neumann

Hi Gustaf,

Thanks for answering!

strange metric

I was looking at Google Webmaster tools, and that's the metric they are using there.

nginx

Right, I've seen that one (although never used). Most of ]po[ servers are behind Pound. However, Pound (and nginx AFAI) don't have caching features. So Varnish allows to store the /images/ folder etc. completely in memory.

broken in your installation

I know that you manage to get really high speed on your IBM hardware, but I've never seen XoWiki pages to come out faster then 250ms on any (Intel) hardware ever, with average speed around 300-400ms. I'm talking about installations directly on hardware, not virtualized environments.

I'm using

time wget http://localhost:30140/xowiki/
for measuring the entire content delivery though, which might include a lot of latency.

We'd be happy to pay your half a day of consulting (your rate whatsoever...) if you'd manage to speedup our standard Intel installation (CentOS 6.2, PostgreSQL 8.4, AOLserver 4.5.1) by a factor 2 or more 😊

Cheers,
Frank

> I was looking at Google Webmaster tools, and that's the metric they are using there.
this is still a strange metric, measuring rather google-visibility than business of the server.

When one compares openacs (with db access etc.) with a server delivering cached pages, one is comparing the computation of pages on the fly and their delivery with the pure delivery of precomputed pages. But that's not comparing aolserver with the caching proxy, but the OpenACS application vs. the proxy. Actually, for dynamic pages, the caching proxy can't do much unless it is skipping cache validation (which would lead to incorrect deliveries, no user tracking etc.).

> I've never seen XoWiki pages to come out faster then 250ms on any (Intel) hardware
Well, on openacs.org with Ubuntu 10.04.4 LTS + pg835 (intel server), i see on such a test 0.101secs real time, and this is with general comments and various includelets in the side-bar.

time wget https://openacs.org/xowiki/aolserver-install
real	0m0.101s
user	0m0.001s
sys	0m0.002s
The total average on openacs.org is 176ms over the last week. One should note that openacs.org is not a tuned site and it is running still a quite old version of OpenACS.

Maybe i'll get in touch when it is less busy on my side to figure out, what makes your site slow ...

all the best
-gustaf neumann

Hi Gustaf,

skipping cache validation

This is what we're doing for certain pages (CSS, JS, images, ...). We have currently included the actual XoWiki pages as well, because we're using the XoWiki of www.project-open.org basically like a static Web site. However, I'd be delighted if we could remove this setting.

less busy

That would be great. It's not only one site, but ALL servers I've seen. My offer concerning the consulting time stands 😊

Frank

Hi Frank,

We're using varnish in a completely different (non-openacs, non-aolserver) environment and what we've found is that when it works, it works great, but getting it to that point can be frustrating due largely to the quirky configuration (which has apparently changed a lot with each major release).

An alternate idea: Why not do the page caching completely within aolserver? Set up filters on each end of the request, at the beginning to check if there's a cached response available and at the end to cache the page that was sent. Since you're doing it within the server you have a very flexible and familiar environment to work within to do whatever validation you need or choose to skip.

Here's a little code I put together to do this:

pagecache-init.tcl

ns_log notice "initializing pagecache filters"
ns_cache create pagecache
nsv_set cls pagecache [ns_cls alloc]

ns_register_filter prewrite GET / pagecache_save
ns_register_filter -priority -10 preauth GET / pagecache_check

-----
pagecache-procs.tcl

ad_proc pagecache_save {why} {
    This is the worker proc for a full-page cache.  It's used as a
    very late output filter
} {
    if {[ns_cls get [nsv_get cls pagecache]] eq ""  && [ad_conn url] == "/"} {
        ns_cache set pagecache [ad_conn url] [ns_conn responsecontent]
        ns_log notice "caching [ad_conn url]"
    }
    return filter_ok
}

ad_proc pagecache_check {why} {
    a very early filter for serving pages from cache
} {
    if {[ns_cache get pagecache [ns_conn url] pagedata]} {
        ns_log notice "serving cached page for [ns_conn url]"
        ns_cls set [nsv_get cls pagecache] served
        ns_return 200 text/html $pagedata
        return filter_break
    }
    return filter_ok
}

This setup as written skips all validation for the cached page - no sessions, no logins, no cookies, etc. On the other hand, it's pretty fast. But it's pretty simple to adjust this tradeoff; for example by making the check a postauth rather than preauth you could have user info to check sessions; you're then accepting the cost of checking the user info, but that's likely still much less than the full page cost.

However, there's one big, possibly showstopper caveat: this code relies on functionality that's only available in the bleeding-edge (cvs head) version of aolserver. I certainly wouldn't mind more testing of this functionality and more eyes on the code, but that might not be an acceptable risk for you. But let me know if there's anything I could do to help on this front.

Cheers,
-J