Forum OpenACS Development: util_memoize performance

Collapse
Posted by Malte Sussdorff on
I am wondering whne util_memoize will become a performance problem instead of a performance gain.

The reason I ask is that I have around 30.000 entries in the cache and I am wondering if searching through those in TCL to get a Name of a person is slower than the 1ms DB query to retrieve the row directly.

At what time does it not make sense anymore. Alternatively spoken, what should be guidelines for using the cache. Obviously if you have an expensive query that makes sense, but how about something which is not expensive but queried very often (like the person name)?

Collapse
Posted by Dave Bauer on
Is this someplace where thread specific cache would make sense and the cache dies at the end of a request?

That is, you need to get the person's name several times to customize a page, in several different packages etc, that are used to build a final page.

Maybe if the query is 1ms, it doesn't matter at all?

You could also time util_memoize just call it 1000 times with time and find out! :) http://www.tcl.tk/man/tcl8.4/TclCmd/time.htm

Then you'll now if it takes > 1ms or not.

Collapse
Posted by Dave Bauer on
util_memoize is faster

With 10,000 keys on my slow laptop 9 microseconds to retreive an entry.

With 100,000 keys 10 microseconds to retrieve an entry.

With 1,000,000 keys 9 microseconds to retriev an entry.

RAM is cheap. Use util_memoize :)

Collapse
Posted by Tom Jackson on
Yes, thanks for the benchmark: everything has to get into memory at some point, the savings is in getting it from current memory, and not bogging down your database which also has to put it into memory, etc. (But even your database will cache certain rows, so it isn't always going to disk.)

This is the difference between 'storage' and 'memory'. Storage is dead storage, on disk, far away. Memory is ready to go. One difference with util_memoize and TLS is the slight overhead for the mutex. But AOLserver uses mutex buckets, so you benefit from a low number of mutexes to maintain even for a lot of somewhat independent nsv arrays. However, if everything is in one array, this could lead to slowness. (I think this is right? I don't think nsv buckets divide arrays, but they might.)

Collapse
Posted by Dave Bauer on
True Tom, I did not test concurrent accesses to the cache I don't know how that performs. I guess you'd never have more than N threads number of concurrent acceses and at 9 microseconds per access, I don't see too much of a locking issue on that. It takes alot longer to write to util_memoize cache.

I just tested and it looks like it takes around 45 microseconds to add an entry to the cache if its not in there. So its still very fast to fill the cache and you are doing that incrementally not 1,000,000 entries all in one go.

Collapse
Posted by Malte Sussdorff on
Thanks a lot for those comparisons and informations. That is really helpful. I was concerned that util_memoize could have a negative impact on performance, so it is save to say it has not. Great!
Collapse
Posted by Don Baccus on
Lookup isn't a problem, the cache using the internal tcl hash functionality.

I'm sure its a bucket hash, but done well bucket hashes perform very well, and it's used everywhere in Tcl so I assume a lot of attention's been paid to making it very fast.

Look into the db-* caching I added back in 5.2 or so, originally developed for greenpeace. It's a bit more straightforward to use for simple caching from the database.

I use it exclusively these days.

What *is* slowish is deleting anything from ns_cache, because you have to loop through entries. But deletion is much, much less frequent that referencing the information, more or less by definition (don't cache things that aren't referenced frequently!)