Forum OpenACS Development: Site node cache memory problems

Collapse
Posted by Malte Sussdorff on
I experience the following issue with the site_node cache:

We are at 350000 site node entries and the cache is currently at around 1.8GB of RAM. We could easily beef our machine for the moment, but with the exponential growth (of site nodes as well) the site is experiencing, I am concerned about the sustainability of this approach.

To this end I was wondering why we did not invent a lazy caching mechanism, so the request processor on the first call hits the database and loads whatever it finds into the cache instead of initializing it upon startup. Taking into account that the whole initialization on startup takes around 1 minute I assume that hitting the DB will not be too much overhead?

This would also help with clustering, if the second webserver cannot find the site_node which might have been generated on the first webserver, it would just go to the database. Not sure if the cluster problems with site_node cache have been solved in a decent manner (though I assume, as I know a lot of you run on cluster setup).

Sadly this does not work at the moment (I tried not to load the cache and failed to access any page). Did noone think it would become necessary or is it a bad idea for reasons I cannot see at the moment?

P.S.: Having said that, I could probably achieve some way by just cleaning up the existing site_node cache from unused entries 😊. Though then I am wondering what to do with the data in the packages. But that is a different story for a different time...

Collapse
Posted by Jose Agustin Lopez Bueno on
Hi Malte!

The problems with site-nodes are not solutionated
in cluster. We have 4 nodes in the cluster.

Every time we create new communities we must to
load all the site-nodes in cache. All the instances in cluster must refresh his own data and the time for this
process can be greather than 1 minute. Sometimes
all the cluster nodes are blocked and the performance
decrease.

select count(*) from site_nodes;
132929
The select waste 10 seconds in postgres.
The reorder of memory tcl arrays waste up to 1 minute.
The update local cache of every node take a duration
then greather than 1 minute X 4 nodes >= 4 minutes.

For the usability of cluster structure we must to redesign
the policy of cache refresh in the procs in acs-tcl:
site_node::update_cache

Best regards,
Agustin