The "caches" for the site-map and localization messages use nsv. The only nice cache statistics are implemented on the c-level for ns_cache. In addition to nsv and ns_cache, we have multiple stores with somewhat similar properties but different interfaces and c level support (also: ns_section/ns_param, the per-thread stores ns_set, Tcl variables, associative arrays (* namespaces)).
So, while it is possible to measure the amount of memory that is put into these stores, it is not possible (without significant C programming) in the general case to figure out, what this content consumes per store (e.g. storing one byte in a tcl variable in a namespace needs significantly more than one byte, since it needs the variable structure, variable name, tcl_obj, maybe a new namespace with its hash tables, ...). Furthermore, a major problem with the current aolserver memory allocator is that it is optimized for high concurrency, but it tends to fragment easily after some time, which increases the memory footprint and reduces the hardware cache efficiency.
How busy is your site with the 3GB memory footprint? Is it an opteron machine? what's your maxthreads and your threadtimeout? Memory consumption is directly related to the number of threads (connection threads and scheduled threads), since every thread keeps a private copy of the tcl procs (+ vars). In a typical dotlrn instance we have are about 8000 procs. So, if one has e.g. 50 threads (e.g. 40 connection threads + 10 scheduled), this makes 400.000 procs....