I experience the following issue with the site_node cache:
We are at 350000 site node entries and the cache is currently at around 1.8GB of RAM. We could easily beef our machine for the moment, but with the exponential growth (of site nodes as well) the site is experiencing, I am concerned about the sustainability of this approach.
To this end I was wondering why we did not invent a lazy caching mechanism, so the request processor on the first call hits the database and loads whatever it finds into the cache instead of initializing it upon startup. Taking into account that the whole initialization on startup takes around 1 minute I assume that hitting the DB will not be too much overhead?
This would also help with clustering, if the second webserver cannot find the site_node which might have been generated on the first webserver, it would just go to the database. Not sure if the cluster problems with site_node cache have been solved in a decent manner (though I assume, as I know a lot of you run on cluster setup).
Sadly this does not work at the moment (I tried not to load the cache and failed to access any page). Did noone think it would become necessary or is it a bad idea for reasons I cannot see at the moment?
P.S.: Having said that, I could probably achieve some way by just cleaning up the existing site_node cache from unused entries 😊. Though then I am wondering what to do with the data in the packages. But that is a different story for a different time...