Forum OpenACS Q&A: Response to MS Proxy Server and Caching Problem with OpenACS 4.2 Web Site

Hamilton pointed me to a great web page that explains caching in some detail:
    http://www.mnot.net/cache_docs/

As well as a cool tool which tells you the "cacheability" of a web resource:
    http://www.web-caching.com/cacheability.html

It seems that for dynamic pages, ACS 3/4 does NOT set any freshness information/validator headers such as Expires, Cache-Control, or Last-Modified.  According to the documentation above, such pages should NOT be cached by most proxies:

...if no validator is present, most caches will mark the object as uncacheable...

However, it seems that some proxies insist on cacheing such content anyways, which explains the strange behavior we are seeing.

With regards to static content, such as GIFs and vanilla HTML files, AOLServer sets the Last-Modified header.  Most proxies will then validate such content by checking with the origin server to see if the content has changed, fetching the latest copy if so.  This is a good thing, though some web servers use the more advanced Etag header which assigns the content a unique ID which changes when the content changes.  AOLServer does not seem to support Etag currently.

Though there are more advanced pointers in the above document, here's my understanding of basic caching as it relates to simple ACS application development:

1) For the most part, the current default behavior of AOLServer/ACS is reasonable for most caches.  However, some aggressive proxies seem to cache even dynamic pages despite having no freshness information/validators.  This could be a very bad thing depending on your application.

2) If you wish to make (reasonably) sure that your page is not cached, then, like IIS, you should probably set:
    Expires: Thu, 01 Jan 1998 07:00:00 GMT (or some date in the past)
    Cache-Control: private

You could probably thrown in a "Pragma: no-cache" or "Cache-Control: max-age=0" header in addition to/instead of the "Expires" header, however, the above document seems to suggest that "Pragma: no-cache" is NOT honored by many proxies.

BTW, this is basically what Russell said several posts ago (thanks Russell!)

3) If you wish to make sure that your dynamic content is cacheable by proxies (because it, say, doesn't change too often), then either (1) dump the content to a static page whenever it changes and link to the static page or (2) set an age-related header such as "Expires" or "Cache-Control: max-age=xxx".

4) Bottom line: proxy cacheing is not a huge issue for dynamic pages, but only when such pages are not personalized per user or not time sensitive.  If a dynamic page shows basically the same information for all users or is not time sensitive (such as, say, a news page), then the current behavior is probably not a big deal since only some users behind aggressive caches may get stale information from time to time.  However, if the page shows the user's name, is personalized in some other way, or is time-sensitive, getting a cached copy from a proxy is very undesireable behavior, and should probably be avoided by the means described above.

Given the above, I agree with C.R.  I think that ACS/AOLServer should be changed such that, by default, the "Expires" and "Cache-Control" headers are set to something similar to the ones shown above in point (2).

This may seem like a big, scary change, but I don't think it is.  Since the majority of proxies don't currently cache dynamic ACS pages anyways, they won't be affected.  Basically, only a small percentage of aggressive proxies will be affected by preventing them from cacheing dynamic ACS content, making them behave like "normal" proxies.  Dynamic pages that can be cached by proxies or that have special caching needs can be handled as described in point (3).

What do you all think?

Thanks...