Forum OpenACS Q&A: Best way to cache a multirow?

Collapse
Posted by Peter Marklund on
My homepage was being very sluggish so I ran developer support on it. What is striking is that permissions is by far what is hurting OpenACS's performance (response times) the most and I am very glad that Don's optimizations are under way. For example, on my index page there were two permission_p calls (from the request processor maybe) and each one was taking around 800 milli seconds, alone far more than all other queries on the page combined. I therefore decided to turn on permissions caching despite the warning that some applications access the permissions in the database directly (we really need to fix that). In general we need to make sure that package developers use a Tcl API when they access the database, at least if they are accessing the core datamodel.

I then turned to my bookmarks index page that still was quite slow. It turned out permissions was the problem again. By removing a check of whether the user can write in any of the displayed folders I cut the response time by seconds. The bookmarks select query still takes about 400 ms though and I'd like to remove that if possible.

So thus, finally my question - what is the best way to cache a multirow? I know Don has used caching for the db API for Greenpeace - can we use that already, or should I use a different approach such as manually convert the multirow to a list and cache that?

Thanks!

Collapse
Posted by Jeff Davis on
Peter, I think if permission checks are taking more than
about 50ms there is something wrong with your db.  On
my scalability box with 40k users and 350k objects it
takes 33ms.  Are you sure you have vacuumed/analyzed your
db?
Collapse
Posted by Peter Marklund on
Jeff,
thanks for the tip about vacuuming, I should do that nightly, right?

On the topic of caching, I just discovered the Cache control page where you can see everything cached by util_memoize (find it at /acs-admin/cache). I added a link to the page so that people can find it.

We are going to be using ns_cache quite heavily now for the workflow Tcl API.

Collapse
Posted by Jeff Davis on
nightly probably is fine. Thats what I usually do. if you have a table with a lot more turnover you might want to vacuum it a little more often (not common on personal sites).

On ns_cache, one thing is the default cache size is 200k I think with is insanely small. We should bump it to 10mb or something I think.

The acs-admin/cache stuff is nice. It came from the worldbank stuff -- thanks evil NGO bent on world domination :) I forgot to merge it over to 4.6 (which I just did).

Collapse
Posted by Don Baccus on
For performance it is always going to be faster to access permissions directly in queries, for instance to query for all the content in a content folder one has permission to read.

But for high-level checks on permissions on a single object yes, the Tcl API should be used.

If you want to play with the caching db_* code I wrote for Greenpeace I'll send you a copy.  I don't know if it's the best way to cache a multirow but you can't beat "db_multirow -cache" for simplicity.

I would like to generalize that code a bit more so it could be more useful.  Caching is easy ... the hard problem is synchronizing the cache with db contents in a reasonably easy-to-use way, and there's a follow-on relatively simple problem to solve, namely plug into the code that memoize now uses to keep cache contents consistent across a server cluster.

Collapse
Posted by Peter Marklund on
For the record, I ran vacuumdb and observed this monstruous query on the index page of Bookmarks:

      select count(*) from bm_bookmarks
                    where owner_id = :viewed_user_id
                    and folder_p = 't'
                    and acs_permission__permission_p(bookmark_id, :browsing_user_id, 'write')

The query went from 6 to 4 seconds, not quite the improvement I had hoped for. Vacuumdb does make a significant difference though and I'll start doing it nightly. Admittedly, the server has an old AMD 400 MHZ something processor. I have plenty of ram though, according to top about 290 MB free.

Don, could you send me the db -cache code. I'd be very curious to see if I could use it.

Thanks!

/Peter

Collapse
Posted by Jeff Davis on
you should also check and make sure you increased your shared buffers on postgresql and that the db cache is populated. It might be that in practice it is much faster once it has run a couple times.

Whats the result of "explain analyze select ..." on that query?

Collapse
Posted by Peter Marklund on
Thanks Jeff,
I'll check on the PG cache and shared buffers.

I experimented with caching the query on the Bookmarks index page. I used util_memoize in conjunction with template::util::list_to_multirow and template::util_multirow_to_list to do the caching and it works fine.

One thing we could optimize though is that the list representation of a multirow (that I store in the cache) has the column names in every row (every row is an array list). I think this is significant overhead if you have verbose column names and many rows. It would suffice to store the column names at index 0 of the list.

Collapse
Posted by Tom Jackson on

I wrote a proc that does just that. Jeff suggested that we have some kind of switch on the list_to_multirow that would allow my code to be used when desired.

Here's the code (which was in the util-procs.tcl file):

ad_proc -public template::util::list_to_multiple { __multiple __nameList __rowList { __level 1 } }  {
    
    Converts list values to multiple tag format. Appends rows to 
    existing multiple.
    
    @param __multiple multiple tag name attribute
    @param __nameList list of array element names
    @param __rowList list of array element values
} {
    
    upvar $__level ${__multiple}:rowcount __rowcount
    
    if {[info exists __rowcount]} {
        set __i [expr $__rowcount + 1]
    } else {
        set __i 1
    }
    
    foreach $__nameList $__rowList {
        upvar $__level ${__multiple}:${__i} __localArray 
        set __localArray(rownum) $__i
        foreach __element $__nameList {
            set __localArray($__element) [set $__element]
        }
        incr __i
    }
    
    set __rowcount [expr $__i -1]
}




Collapse
Posted by Roberto Mello on
Peter,

You mention you ran vacuumdb. What flags did you pass it? You should pass at least the -z (ANALYZE) flag. If you're only going to do it nightly, you should probably pass the FULL flag as well.

But like Jeff mentioned, it is also really important to bump the PG defaults on postgresql.conf (shared_buffers, sort_mem, and of special relevance to OpenACS is geqo_threshold). If you increase those, you'll most likely need to bump up your kernel's shared memory limit (see first reference).

We should put this all summarized in a tuning doc. Here are some references that I've collected along the years:

Optimizing PostgreSQL.
PostgreSQL Performance Tuning
PostgreSQL Database Performance Tuning
PostgreSQL Hardware Performance Tuning by Bruce Momjian
How-To: Replacing Slow Loops in PostgreSQL by Joel Burton

-Roberto