Forum OpenACS Development: Response to Cacing of database queries ...

Collapse
Posted by Michael Bryzek on
While I see the benefits of caching database queries, is it really a requirement to provide automatic caching of all queries? On the site I am currently working on (n.b. we are using redhat ccm but I think the problem is the same), our caching needs came down to the following requirements:

  * Cache a portion of a page for some period of time

  * Cache a particular query forever

  * Cache a particular query for some period of time

  * Cache a particular query until some event happens.

The event that indicates the need for cache flusing is not standard and is not trivial in some cases. Our solution to the above three requirements was to use a key based cache with an optional parameter to specify the length of time after which the cache expires. Whenever an event occurs that indicates the need for cache flushing (e.g. creating a new site node), we manually flush the appropriate cache using an appropriate key.

In the OACS world, one possible way to do the same for the queries is to use the unique keys we already have for the queries as the cache keys. When you create a site node, you flush that key from the cache (or at least mark it to be flushed at next access).

In practice, I have found that I really only need/want to cache a few queries - the expensive ones or the ones our users hit the most often. As a developer, i know what these queries are - after all, I already had to tune them :) I would thus be very happy with an API to cache individual queries rather than a parameter to cache all my queries. With this type of design, it is up to the developers to decide explicitly what to cache, for how long, and when to flush.

From the previous posts, we have the following concrete needs:

* Greenpeace: flush individual db queries every 5 minutes

We can provide a -cache_time flag to the db_* procs and simply use the already unique name of the query as a cache key.

  * Permissions: "define a cache pool specifically for permissions, which gets flushed whenever an object's permissions change"

My assumption here is that we would be accessing the permissions we are checking through an existing api (e.g. ad_require_permission or something similar). These methods would then explicitly cache the results of the permission checks using a key which could be deterministically generated from the object_id.

The code which updates the permissions of an object could then flush the cache for the permissions key for that object.

  * Grouping of cached items - if you flush one item, you must flush these other items (from the example on keeping content in-sync with its associated side panels)

This is a pretty tough requirement. A cursory glance through the docs for ns_cache indiciates it is already capable of flushing the cache based on a key pattern. If so, we can meet this requirement by creating composite cache keys and then flushing based on the first part of the key.

  * Flushing site nodes

The strategy here would be similar to permissions. We ensure a deterministic way to get the cache key for a given site node. When a site node is edited, we flush the cache for that key.

What do you think about requiring a bit more work from the developer to remove the need for complex logic pertaining to cache flushing? I haven't thought too much about the details of implementing the above in OACS, but I believe we have enough support with ns_cache to do the above. Please do correct me if I'm mistaken here.