Forum OpenACS Q&A: Some thoughts on OpenACS caching

Collapse
Posted by Brian Fenton on

Hi all

I've been testing some OpenACS caching mechanisms in our product, and thought people might be interested in this. I particularly like the -cache_keys parameter available on some of the db_ procs. Of course, the easy part is caching the database call, the hard part is knowing where to flush the cache.

One proc that I noticed gets called a lot and that isn't using caching is apm_get_installed_versions https://openacs.org/api-doc/proc-view?proc=apm_get_installed_versions&source_p=1&version_id=

Now, db_foreach doesn't support -cache_keys, but I tried a quick rewrite using db_list_of_lists e.g.

ad_proc -public apm_get_installed_versions {
     -array:required
 } {
     Sets the current installed version of packages installed on this system
     in an array keyed by package_key.
     
     @param array Name of array in caller's namespace where you want this set
 } {
     upvar 1 $array installed_version
 
     set lol [db_list_of_lists -cache_key installed_packages installed_packages { 
               select package_key, version_name
               from apm_package_versions
               where  enabled_p = 't'
             }]
             
     template::util::list_of_lists_to_array $lol installed_version    
 }

Some quick timing tests gave an average speed improvement from 90745 microseconds to 603 on our ancient dev server, which is pretty good. However, I'd need to think further about the cache flush. Generally we would never enable/disable packages without a system restart, so in our case, it's probably safe not to worry about flushing the cache.

It may also be preferable to use util_memoize in this case instead of -cache_keys, as there is still the call to template::util::list_of_lists_to_array, but still I thought it was interesting to report.

Collapse
Posted by Gustaf Neumann on

I particularly like the -cache_keys parameter available

Well, the db_interface with "-cache_key" iis not a recommended interface, since this does not scale and leads to many locks. Flushing of this cache is brute-force or not happening at all. In the sketched case - as you noted -, the cache has to be flushed (in the full cluster) whenever a new package is enabled (or installed) or disabled.

If you do not care about getting proper results in these situations, one can use the version below on oacs-5-10, which will be a lock free cache (not flushed as well).

one proc that I noticed gets called a lot ...

Why is apm_get_installed_versions gets called a lot on your site. On stock instances, this is only called by the package installer...

-g

  ad_proc -public apm_get_installed_versions {
     -array:required
  } {
     Sets the current installed version of packages installed on this system
     in an array keyed by package_key.
     
     @param array Name of array in caller's namespace where you want this set
 }  {
     upvar 1 $array installed_version
 
     array set installed_version [acs::per_thread_cache eval -key acs-tcl-apm_get_installed_versions {
        db_list_of_lists -cache_key installed_packages installed_packages { 
            select package_key, version_name
            from apm_package_versions
            where  enabled_p = 't'
        }
    }]
  }
Collapse
Posted by Brian Fenton on
Hi Gustaf

many thanks for the information - as always, very informative. I don't understand what you meant about "does not scale and leads to many locks. Flushing of this cache is brute-force or not happening at all." If we are not flushing, then this shouldn't be a problem, right? In the case where we are flushing, would it be a reasonable approach to use a different cache pool for each db-* call we want to cache? That will create a different ns_cache under the hood, right?

Does the util_memoize approach also have this problem?

Thanks for the code sample - sadly our OpenACS version doesn't have this feature.

I checked to see why we are running apm_get_installed_versions so much, and it looks like one of my colleagues added that in some custom code, so yes you are correct that it's not stock code.

Brian

Collapse
Posted by Gustaf Neumann on
does not scale ...

On large applications of OpenACS (many cache entries, many threads) the performance will degrade substantially.

... and leads to many locks. Flushing of this cache is brute-force

The reasons are wild-card flushes like the ones below.


./new-portal/tcl/portal-procs.tcl: db_flush_cache -cache_key_pattern portal::get_page_header_stuff_${portal_id}_*
./acs-subsite/tcl/application-group-procs.tcl: db_flush_cache -cache_key_pattern application_group_*

The problem is that with a substantial cache (e.g. 100K cache entries) every wild-card flush requires

1. Lock the cache
2. Transfer all cache keys (e.g. 100K) into the Tcl interpreter
3. Iterate over every cache entry and perform a "string match" operation
4. For matching entries, flush these
5. Unlock the cache

The more entries, the longer the lock duration will be. When multiple such locks happen, the second one has to wait until the first one finishes until it gets the lock, Therefore, the waiting times can pile up. In the following figure, t1, t2, t3 ... are mutex lock operations issued from multiple threads.

Note that these locks bring these threads to a full stop, and even fast calls have to wait. I have seen in real world examples, where such locks take up to 100ms or longer, which means that this brings the server to a crawl. It is not unusual that even in actual versions of OpenACS there might be many locks (100+) locks on the db_cache_pool.

The main problem are the wild-card flushes. Even with the cache key that you are using (without wild cards), some other cache flushing operation might cause a long locks and make your requests slow.

The solution is to avoid wild card locks (which is sometimes hard for db-queries) and to use partitioned caches. I will tell probably something about this in the forthcoming OpenACS conference.

Does the util_memoize approach also have this problem?

Yes, pretty much. This applies for wild-card flushes on all ns_caches, also util_memoize, which is a kitchen sink cache. You might have noticed, that the usage of the util_memoize cache was significantly reduced over the last years. It is much better to have multiple specialized caches than one large cache.

sadly our OpenACS version doesn't have this feature.
If you are concerned about performance, upgrading will improve it. Caching is greatly improved over the last years.

Hope, this explains.
-g

Collapse
Posted by Brian Fenton on
That was very very helpful, I understand the problem better now. Thank you so much, Gustaf.

My takeaway from your comments is that if I never flush, then it's safe to use this caching.
And also if I do need to flush, that it is best to use a partitioned cache i.e. a dedicated cache pool, and keep it as small as possible.
Also I will take a look at our usage of util_memoize and see if there are any potential issues there.

Upgrading our system would be wonderful, but that ship has sailed unfortunately.

Best wishes
Brian

Collapse
Posted by Gustaf Neumann on
My takeaway from your comments is that if I never flush ...
The problem is not a cache flush in general, but wild card flushes (which have to iterate over all the cache keys). Also note, that wild-card flushes on the cache stop as well all other operations on this cache (e.g. the flush from portal-procs above will block as well "your" harmless operation).

... then it's safe to use this caching.
Whatever "safe" means. When something is cached from the database, and the database is updated, but the cache is not flushed, stale results (incorrect) will be reported.

In general, it is better to use specialized caches with precise flush semantics (typically via API), not based on wild-card flushes. Many small caches are also better, since this leads to better concurrency. If there is only one cache, and this cache is locked, then everything comes to a halt. On our LEARN system, we have on average 320 locks per page view, all of these can bring all running threads to a full stop.

that ship has sailed unfortunately.
When your application is performance sensitive, an upgrade will help. Since acs-core is adapted to work with Oracle 19c, upgrading your core should be feasible. This is probably less effort than cherry-picking some changes to improve performance, increase security, etc. Let me know if I can help....

all the best -g

Collapse
Posted by Brian Fenton on
Thanks for the clarifications. Understood.

Brian