Forum OpenACS Development: Re: OpenACS Performance Tests

9: Re: OpenACS Performance Tests (response to 1)

Posted by Gustaf Neumann on 05/16/08 09:52 AM

Eduardo, 
why are you sticking with the following two lines in your example?

   ns_param maxidle 1000000000
   ns_param maxopen 1000000000

These only add confusion to the discussion.

It is indeed strange that in your configuration, only pool1 is not used. 
This is not, what we observe. Check these two things:

 - what is the result of [db_available_pools ""]
   (e.g. in ds/shell)

 - if the list returned starts with pool1, then something
   in your configuration seems to block pool1. In
   this case, i would recommend to add some trace statements
   with "ns_log notice" to proc do_with_handle in
   00-database-procs to see, who is grabbing pool1

Concerning your results

   Results for / (XoWiki page)
   0 ms gethandle gethandle (returned pool2)
   ...
   222 Total Duration (ms)

   Results for /dotlrn 
   1327 ms gethandle gethandle (returned pool2)
   ...
   18,977 Total Duration (ms)

First of all, i am wondering, from which of the 100 requests this output is from. 
Note the difference in gethandle. In the dotlrn case, you were 
already out of handles (the request had to wait 1.3 seconds for the handle)

Second, if i look at the xowiki request, more than 20% are
from permission checking. 
I would recommend to turn permission caching on (kernel parameter). 

If i take the remaining time, about half of the time of the xowiki test is coming 
from lars-blogger (that you seem to call via your own includelet) and a 
third is coming from categories (xowiki includelet). 

So, these tests say only something about your configuration. Same is 
true with dotlrn values (for example, the time for the home portal page depends very 
much on the number of communities your are member of).

10: Re: OpenACS Performance Tests (response to 9)

Posted by Eduardo Santos on 05/16/08 03:09 PM

Hi Gustaf,

I'm still running some other tests, but I've seen some odd thing: the command db_available_pools "" returned this:

pool2 pool3 pool1

It seems like the pool order is reversed, and I have no idea what was causing it. There's also something that is making my test results be a little bit wrong: the repeated query Operation Blocked thing. As the requests are made simultaneously from the same machine, the system is stoping it thinking the same query is being sent. I have to find a way to untoggle this feature so the results can be OK.

I'm still working with your others suggestions and I'll put my results here as soon as I can.

11: Re: OpenACS Performance Tests (response to 10)

Posted by Gustaf Neumann on 05/16/08 03:28 PM

A simple approach to fix the wierd order might be to change

  ns_section ns/server/${server}/db
    ns_param   pools              "*"
    ns_param   defaultpool        pool1

to 

  ns_section ns/server/${server}/db
    ns_param   pools              pool1,pool2,pool3
    ns_param   defaultpool        pool1

Concerning request monitor:
to deactivate the repeated operation and blocking stuff, but keeping the measuring, 
simplify the method check (in xotcl-request-monitor/tcl/throttle_mod-procs.tcl) to 

   throttle ad_proc check {} {
    ...
   } {
     my get_context
     return 0
   }

I think, this should help (add this definition simply to the end of the .procs.tcl file, 
you can simple delete it from there, when you are done)

12: Re: OpenACS Performance Tests (response to 11)

Posted by Eduardo Santos on 06/17/08 03:31 AM

Hi Gustaf,

Sorry for the late answer, but I was doing some other things and I left the tests a little bit aside. I'm going to have a report really soon about the effects of changing every parameter in the performance.

My question now is much more related to the service itself than the tests environment. I'm facing some performance issue right now that I don't where it comes from. Evry once in a while, about every ten minutes to half hour, I can see a major performance issue. I just can't get no response and the db queries seem to get real slow. Trying o find out what this is about, I"ve found that, in my log, I can see these messages:

[16/Jun/2008:21:58:57][14985.1296435552][-default:720-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:57][14985.1317448032][-default:721-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:57][14985.1401497952][-default:728-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:57][14985.1300638048][-default:736-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:57][14985.1275423072][-default:730-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:57][14985.1174587744][-default:756-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:57][14985.1393092960][-default:705-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:57][14985.1103145312][-default:747-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:57][14985.1334258016][-default:724-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:57][14985.1342663008][-default:726-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:57][14985.1107347808][-default:745-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:57][14985.1292233056][-default:718-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:58][14985.1204005216][-default:744-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:58][14985.1174587744][-thread1174587744-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:58:58][14985.1174587744][-thread1174587744-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 302ms) [16/Jun/2008:21:58:58][14985.1275423072][-thread1275423072-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:58:59][14985.1191397728][-default:759-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:59][14985.1275423072][-thread1275423072-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 497ms) [16/Jun/2008:21:58:59][14985.1288030560][-default:733-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:59][14985.1103145312][-thread1103145312-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:58:59][14985.1145170272][-default:713-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:59][14985.1267018080][-default:742-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:58:59][14985.1199802720][-default:702-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:00][14985.1376282976][-default:735-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:00][14985.1103145312][-thread1103145312-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 667ms) [16/Jun/2008:21:59:02][14985.1325853024][-default:737-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:03][14985.1212410208][-thread1212410208-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:03][14985.1212410208][-thread1212410208-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 233ms) [16/Jun/2008:21:59:03][14985.1393092960][-thread1393092960-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:04][14985.1393092960][-thread1393092960-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 1050ms) [16/Jun/2008:21:59:05][14985.1321650528][-default:722-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:07][14985.1199802720][-thread1199802720-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:08][14985.1199802720][-thread1199802720-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 1202ms) [16/Jun/2008:21:59:10][14985.1170385248][-default:717-] Warning: db_exec: longdb 5 seconds nsdb0 dml dbqd.acs-tcl.tcl.security-procs.sec_update_user_session_ info.update_last_visit [16/Jun/2008:21:59:11][14985.1149372768][-default:753-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:11][14985.1166182752][-default:716-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:12][14985.1111550304][-default:746-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:18][14985.1292233056][-thread1292233056-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:18][14985.1334258016][-thread1334258016-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:19][14985.1292233056][-thread1292233056-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 936ms) [16/Jun/2008:21:59:20][14985.1334258016][-thread1334258016-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 2293ms) [16/Jun/2008:21:59:25][14985.1140967776][-default:731-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:26][14985.1132562784][-default:712-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:29][14985.1254410592][-default:738-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:29][14985.1149372768][-thread1149372768-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:30][14985.1149372768][-thread1149372768-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 1293ms) [16/Jun/2008:21:59:30][14985.1338460512][-default:725-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:31][14985.1182992736][-default:757-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:34][14985.1161980256][-default:696-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:35][14985.1271220576][-default:719-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:35][14985.1098942816][-default:749-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:36][14985.1317448032][-thread1317448032-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:36][14985.1317448032][-thread1317448032-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 327ms) [16/Jun/2008:21:59:40][14985.1098942816][-thread1098942816-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:40][14985.1178790240][-default:741-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:41][14985.1098942816][-thread1098942816-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 1219ms) [16/Jun/2008:21:59:42][14985.1258613088][-default:708-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:42][14985.1153575264][-thread1153575264-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:44][14985.1304840544][-default:734-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:44][14985.1166182752][-thread1166182752-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:44][14985.1119955296][-default:750-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:44][14985.1153575264][-thread1153575264-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 1685ms) [16/Jun/2008:21:59:46][14985.1208207712][-default:740-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:46][14985.1090537824][-default:727-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:46][14985.1115752800][-default:697-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:46][14985.1136765280][-default:752-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:46][14985.1372080480][-default:698-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:46][14985.1140967776][-thread1140967776-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:46][14985.1262815584][-default:707-] Notice: exiting: timeout waiting for connection [16/Jun/2008:21:59:46][14985.1166182752][-thread1166182752-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 2535ms) [16/Jun/2008:21:59:46][14985.1145170272][-thread1145170272-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:48][14985.1140967776][-thread1140967776-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 2978ms) [16/Jun/2008:21:59:50][14985.1145170272][-thread1145170272-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 4059ms) [16/Jun/2008:21:59:53][14985.1401497952][-thread1401497952-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:57][14985.1304840544][-thread1304840544-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:21:59:57][14985.1401497952][-thread1401497952-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 4555ms) [16/Jun/2008:22:00:00][14985.1271220576][-thread1271220576-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:01][14985.1304840544][-thread1304840544-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 4575ms) [16/Jun/2008:22:00:02][14985.1288030560][-thread1288030560-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:04][14985.1271220576][-thread1271220576-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 4110ms) [16/Jun/2008:22:00:05][14985.1372080480][-thread1372080480-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:05][14985.1178790240][-thread1178790240-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:08][14985.1288030560][-thread1288030560-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 5566ms) [16/Jun/2008:22:00:08][14985.1372080480][-thread1372080480-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 2902ms) [16/Jun/2008:22:00:08][14985.1178790240][-thread1178790240-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 2884ms) [16/Jun/2008:22:00:14][14985.1208207712][-thread1208207712-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:17][14985.1262815584][-thread1262815584-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:17][14985.1182992736][-thread1182992736-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:17][14985.1262815584][-thread1262815584-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 624ms) [16/Jun/2008:22:00:18][14985.1208207712][-thread1208207712-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 3466ms) [16/Jun/2008:22:00:19][14985.1107347808][-thread1107347808-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:20][14985.1090537824][-thread1090537824-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:24][14985.1107347808][-thread1107347808-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 4952ms) [16/Jun/2008:22:00:24][14985.1090537824][-thread1090537824-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 3785ms) [16/Jun/2008:22:00:24][14985.1132562784][-thread1132562784-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:25][14985.1086335328][-sched-] Warning: db_exec: longdb 9 seconds nsdb0 dml dbqd.acs-tcl.tcl.security-procs.sec_sweep_sessions.sessions_sw eep [16/Jun/2008:22:00:25][14985.1342663008][-thread1342663008-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:26][14985.1086335328][-sched-] Warning: sched: excessive time taken by proc 4 (11 seconds) [16/Jun/2008:22:00:28][14985.1132562784][-thread1132562784-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 3387ms) [16/Jun/2008:22:00:28][14985.1191397728][-thread1191397728-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:29][14985.1300638048][-thread1300638048-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:30][14985.1191397728][-thread1191397728-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 1976ms) [16/Jun/2008:22:00:32][14985.1342663008][-thread1342663008-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 7163ms) [16/Jun/2008:22:00:35][14985.1325853024][-thread1325853024-] Notice: destroy called, ::bgdelivery ::xotcl::THREAD->destroy (0ms) [16/Jun/2008:22:00:37][14985.1300638048][-thread1300638048-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 7627ms) [16/Jun/2008:22:00:37][14985.1325853024][-thread1325853024-] Notice: destroy called, ::throttle ::xotcl::THREAD->destroy (0ms, 2453ms)

All of a sudden the XoTcl get destroied, and I have no idea where it comes from. Can you give some hint about it?

13: Re: OpenACS Performance Tests (response to 12)

Posted by Gustaf Neumann on 06/17/08 10:00 AM

what i can see from your log is that
 - connection threads time out (most likely due to
   your threadtimeout settings), it seems they
   are no fed by new requests.
 - the "destroy called" messages are not serious.
   When a connection thread terminates (the thread is
   destroyed) all the objects it contains are destroyed
   as well. Since the deletion semantics on the C
   level are quite tricky, i left the notice calls
   in. You just see here the messages of the
   thread proxy objects, XOTcl itself is not destroyed.

 - a few db-queries seem to be quite slow:
   
    5 seconds nsdb0 dml dbqd.acs-tcl.tcl.security-
      procs.sec_update_user_session_info.update_last_visit
    9 seconds nsdb0 dml dbqd.acs-tcl.tcl.security-
      procs.sec_sweep_sessions.sessions_sweep
    sched: excessive time taken by proc 4 (11 seconds)

Without more information, it is hard to guess what happens. 
Is the "performance issue" happen during/after the benchmark 
or during normal operations?

To see, what's happening in your database, use e.g.
http://search.cpan.org/dist/pgtop/pgtop
http://www.rot13.org/~dpavlin/sysadm.html

For a deeper understanding of postgres semantics
in particular with checkpoints, see
http://www.westnet.com/~gsmith/content/postgresql/

14: Re: OpenACS Performance Tests (response to 13)

Posted by Eduardo Santos on 06/17/08 02:36 PM

Hi Gustaf,

Thank you for your quick answer. I have already installed in my machine ptop and some other tools to monitor PostgreSQL. From the PostgreSQL analysis, it seems like every time I can see these messages in my log, all the DB queries get very slow and ptop shows me that they are waiting to be parsed (they are in waiting state). From this analysis, my first thought is that this destroy objects call is bringing the performance down.

Then, I'm asking myself: why it happens? When I run the benchmark, the system uses all DB connections and threads available. After that, it destroys some of them, what shows me the log messages above. However, when the threads are being destroyed, the system gets so slow that we just can't navigate.

So, there are some thoughts I can get from here:

1 - Is the thread destroy proccess being a performance problem?
2 - Why the DB connections stay in waiting when this destroy is hapenning if I have memory, processor and connections available?

I'll try to take a closer look at the PostgreSQL performance info at this time and try to see something I haven't seem yet. Thank you for your help.

15: Re: OpenACS Performance Tests (response to 14)

Posted by Gustaf Neumann on 06/19/08 01:43 PM

The primary question is to figure out, what resources are running out. Candidates
  - cpu
  - amount of memory
  - memory bandwidth
  - i/o

and who is causing it
  - aolserver (e.g. lock-contention)
  - postgres (e.g. checkpoint handling, vacuuming, 
    some complex queries...)

is the whole machine in a bad state (e.g. 
load etc. how does it react on the console)
or just the aolserver or the database?

how many cpus do you have on your system 
(fgrep processor /proc/cpuinfo)?

In respone to your questions:
 - i have never seen thread destroy as a 
   performance problem, but under "normal
   conditions", not all threads end 
   at more or less the same time.
   Normally, the problem with the thread
   destroys is just that they might be
   created quite soon later, thread 
   creation is slow, when the blueprint
   is large (when you have many packages
   installed). However, if one has a large
   blueprint, the thread cleanup has to 
   free all of the contents, which 
   might be a couple of million individual
   free operations. This might entail 
   as well quite a large number of 
   memory locks.

 - there are as well many reasons
   for possible waiting operations.
   Do you see error message concerning
   DEADLOCKS in you database? OpenACS
   5.4 uses less locks (which were introduced
   to overcome some problems in PostgreSQL
   8.0 and 8.1).

-gustaf neumann

16: Re: OpenACS Performance Tests (response to 12)

Posted by Gustaf Neumann on 12/26/08 04:28 PM

Eduardo,

the aolserver has the bad behavior, that it is likely that the thread termination parameters (maxconnections, threadtimeout) terminate all connection threads at the same time. This happens, when these threads were started at more or less the same time, which is likely on busy sites or during benchmarks. This mass extinction of threads is performance-wise not a good idea, especially when most of the threads have to be recreated afterwards. Termination and recration of threads are costly operations learning to noticeable delays.

To overcome this problem i have committed a small change to aolserver 4.5 head, that introduces a "spread", a randomization factor for the two mentioned thread termination parameters. The spread will lead to slightly different values for maxconnections and threadtimeout per connection thread (default is +- 20%). When spread is 0 one obtains the original behavior. You might be interested in trying this version.

17: Re: OpenACS Performance Tests (response to 16)

Posted by Tom Jackson on 01/12/09 09:46 AM

Gustaf is right here. The AOLserver code does not take into account the size of the queue or the number of requests which could be served by a thread before allowing the thread to exit.

There is a basic logic problem in the code, but it really only shows up under test conditions. Two real world complicating factors are running very few threads (less than 10) and serving long running requests (slow client or huge data).

My experience is that external benchmark programs are less reliable than AOLserver, so it is very difficult to use their results. Most of the slowness in OpenACS will be in database queries, which will be impossible to detect using an external tool.

OTOH, if you expect hundreds of simultaneous visitors to a DB backed site, be happy with your success and look into distributing your site over a number of front end servers.

18: Re: OpenACS Performance Tests (response to 16)

Posted by Eduardo Santos on 01/12/09 12:33 PM

Hi Gustaf and Tom,

Thank you very much for your replies. I'm sorry for the time I've taken to post, but I had some personal issue to solve.

With our tests and observations, we are writing some kind of benchmark howto document that should go somewhere in the community, maybe in the XoWiki instance. Our tests had three branches of observation:

1 - AOLServer (OpenACS?) Tunning
2 - PostgreSQL Tunning
3 - SO Tunning

I'm going to try to make a brief description about the specific and generic observations we could find:

<h4> AOLServer (OpenACS? Tunning) </h4>

Our first issue with AOLServer was the 4.5.0 parameters problem that somebody fixed with the file /acs-tcl/tcl/pools-init.tcl as said in this post in Tom's message. With a simple update we where able to solve this issue.

The other problem was with the thread creation process wich, as you just said, is a mixed XoTcl + AOLServer problem. The most important thing we've realized is that the creation and destruction process consumes a lot of I/O operations. To improve the I/O performance we've tried to change the file system, but it had no effect due to the most important thing we've find out: DON'T EVER USE VM's IN PRODUCTION SITES.

Our server was based on Xen VM's and it was impossible to have I/O performance with Virtual Machines. The whole thing about it is that there's no virtualization process able to split the blocks and inodes completely when using virtual machines, so all the I/O is shared between the VM's in the Cluster. It's a little bit different from what happens with logical partitions possible in some hardwares, such as IBM's big servers. In that case the files are completely separated and the partition works with no I/O issues.

Based on this observation, we've switched the production environment to a dedicated server with the specifications described in the first post of this thread, and most of the problems with threads creation and destruction are gone.

The next step was to adjust the configuration file. I guess the biggest challenge to everybody using OpenACS is the best relation between number of users X maxconnections X requests X maxthreads. This is the part where we are stuck right now. According to this Gustaf's Post on AOLServer list:

the question certainly is, what "concurrent users" means (...) note, that when talking about "concurrent users", it might be the case that one user has many requests concurrently running. To simplify things, we should talk about e.g. 10 concurrent requests.

Then you need
- at least 10 threads
- at least 10 connections
The number of database connections is more difficult to estimate, since
this in an application(oacs) matter. In oacs, most dynamic requests need
1 or 2 db connections concurrently.

I know this is not like a cookbook, but using his observation we could think about one thread, one connection and 2 db connections for each request. With these parameters, concerning that the tests are going to perform all the requests we configure at the same time, the results are most likely following this logic. When we set the maxthreads to 150, there's a little bit of degradation in memory trying to serve 100 simultaneous requests. This degradation is over when you set the parameters minthreads to 80 and maxthreads to 120. What we can get from here is that one thread is able to serve one request in the best performance adjustments.

However, when you send these setting to production, there's maybe the most difficult problem: to estimate the number of requests per user. Maybe one patch in xotcl-request-monitor could answer this question, but we are still thinking about the best way to do it. We could also see that the number of connections per user is very different from the relation 1 X 1, and this another thing we are trying to find the best relation.

<h4> PostgreSQL Tunning </h4>

All the tests we are performing until now consider DB and OpenACS in the same box. The goal is to find out the performance limit to this setting, so we can remove PostgreSQL from the machine and measure how it gets better.

There's a lot of material for PostgreSQL in the Internet, and I'm not one specialist myself, but I guess there are some specific observations that could be done.

Everybody says that Internet applications are doing most of the time SELECT operations in the database. This is a myth. If you consider number of operations maybe it can be true, but it's not when you consider execution time and resources usage. The number of TPS (Transactions Per Second) in an application using OpenACS is very large, and that's a good thing. In the old Internet you where creating content to people see. The new paradigm says the opposite: now the users create the content, and that's why I use OpenACS. There's just no better option to build social and collaborative networks in the market.

Concerning this matter, most of PostgreSQL tuning involves transaction improvements. I can't still understand completely the pools mechanism that OpenACS uses for PostgreSQL, and I guess some improvement in this area could make my tests better.

The most important thing to adjust here is the shared memory mechanism. We've seen that, if you put a too big number in PostgreSQL and OS, the memory shared between PostgreSQL and AOLServer can cause the system to crash under stress situations, and that's not a good thing. The I/O becomes a problem with a large number of INSERT and DELETE operations, mostly because the thread creation process is also heavy for the system.

The conclusion is: if you want to have the best performance, you really have to split AOLServer and PostgreSQL in different boxes. The exact point to do it (DB size, number of users) is what we are trying to find out.

<h4> OS Tuning </h4>

Maybe this is the most difficult part to be done, because Linux has a lot of options that can be changed. A better analysis concerning resource usage is necessary so we can have better numbers. I'm going to put here a list of parameters we are changing in a Debian GNU/Linux Etch system:

# echo 2 > /proc/sys/vm/overcommit_memory # echo 27910635520 > /proc/sys/kernel/shmmax # echo 32707776 > /proc/sys/kernel/shmall # echo deadline > /sys/block/sda/queue/scheduler # echo 250 32000 100 128 > /proc/sys/kernel/sem # cat /proc/sys/fs/file-max 753884 # echo 16777216 > /proc/sys/net/core/rmem_default # echo 16777216 > /proc/sys/net/core/wmem_default # echo 16777216 > /proc/sys/net/core/wmem_max # echo 16777216 > /proc/sys/net/core/rmem_max

# su - postgres postgres@nodo406:~$ ulimit 753883

There are some other things, such as Kernel changes and TCP/IP configuration we are changing, but I don't think we have the better adjusts yet, so let's wait a little longer.

That's it. Sorry about the long post, but I guess I should give the community some qualified feedback, concerning all the help you guys always give to us.

19: Re: OpenACS Performance Tests (response to 18)

Posted by Gustaf Neumann on 01/13/09 12:05 PM

Eduaro, short comments to some of your statements:

- The last change in /acs-tcl/tcl/pools-init.tcl was 15 months ago. You might have been affected by change in the default pool order, http://fisheye.openacs.org/browse/OpenACS/openacs-4/etc/config.tcl?r1=1.47&r2=1.48

- i don't see an kind of "XOTcl problem" in this discussion (maybe i am overly sensible in this question). The only "problem" is see is that the xotcl-core writes a message into the error log, when aolserver threads exit. They would exit as well without XOTcl being involved.

- setting maxthreads to 150 is very high. We use on our production site (more than 2200 concurrent users, up to 120 views per second) maxthreads of 100. by using background delivery, maxthreads can be normally reduced. keep in mind that ever connection thread (or thread for scheduled procs) contains a full blueprint of all used packages, these are 5.000-10.000 procs). In other words, for every thread there is a separate copy of all procs in memory. We are talking about 500.000 to maybe more than one million (!) tcl procs, when 100 threads are configured. This is not lightweight. When all threads go down (as sketched in my earlier posting), all these procs are deleted and maybe recreated immediately in new threads.

- "difficult problem ... estimate the number of requests per user": The request monitor gives you on the start page the actual number of users and the actual number of requests per minute (in the graph), and the actual views/sec and the avg. view time per user. If you have an average view time (AVT) per user of say 62 seconds, then the views per user (VPU) per minute is VPU = 60/AVT = 0.96.
Therefore, 100 users generate U*VPU requests per minute (with the sample values: 96). Getting estimates on Requests per user is just the simple part. More complex is to figure out, how many threads one will need for that. One needs the average processing time, which is unfortunately depending in reality, how many other requests are currently running (all requests share common resources and are therefore not independent).

- Anyhow, the request monitor shows you as well the number of currently running requests, and by observing this value one can estimate the number of required threads. If you have eg. 10 connection threads configured, and you observe that often 8 ore more requests are running, you are already running into limits. Once all connection threads are busy, the newly incoming requests are queued, causing further delays. Even simple requests (like e.g. a logo) can take significant time, since the response time to the user is queuing time + processing time. If the QT is 10 seconds, every request will take at least 10 seconds. If the machine was not able to process the incoming requests quickly enough with the configured threads in the past than it is questionable, when it will be able to catch up. So, queuing requests should be avoided. This is also one place, where the background delivery helps since it frees connection threads much earlier to do other work. On the other hand, if you have e.g. 30 connection threads configured, and you see normally only 2 or three concurrent requests, you are most probably wasting resources.

- i do not agree with the conclusion "for best performance, you really have to split AOLServer and PostgreSQL" to different servers. It depends on the machines. Maybe you are correct with dual or quad-core Intel processors. If the database and aolserver are on the same machine, and you have enough memory and your have enough CPU powers (e.g. 8 cores or more) and your machine has a decent memory-throughput/latency, then local communication between database and aolserver is significantly faster. There are situations, where performance is better, when database and aolserver are on the same machine, even if the server is very busy (as on our learn@wu system). Certainly, it depends on the hardware and the usage patterns.

- concerning running in VMs: i completely agree, for a busy production site, using VMs is not good idea - at least not for beginners (tuning VMs and the hosting environment can help a lot, but there is yet another layer to fiddle around) Usually you have only one cpu visible from inside the VM, so using multiple CPUs (one strength of aolserver + postgres) won't help, so scalability is limited. This is one experience we are going currently through with openacs.org.

have to rush
-gustaf

20: Re: OpenACS Performance Tests (response to 19)

Posted by Gustaf Neumann on 01/14/09 12:35 AM

PS: with upcoming processors like POWER7 http://en.wikipedia.org/wiki/POWER7 or Rock http://en.wikipedia.org/wiki/Rock_(processor), we will get 32 simultaneous threads (SMT) per chip! Linux will see this as 32 processors. Cool machines for running OpenACS and PostgreSQL.

21: Re: OpenACS Performance Tests (response to 20)

Posted by Neophytos Demetriou on 01/14/09 05:19 AM

True, if the only option (due to the architecture) is to scale up (as opposed to the ability to scale out).

22: Re: OpenACS Performance Tests (response to 21)

Posted by Gustaf Neumann on 01/14/09 09:34 AM

i don't see these options as alternatives, but as directions. It is bad if one has multiple cores in a machine is not able to use it. The basic OpenACS infrastructure was built for multiple processing units. The point is simply: the micro-processor designers stopped a few years ago to increase frequencies and to add cores and SMT. These chips arrive now, Aolserver and OpenACS are ready to use them - NOW.

It would be certainly an interesting project to build something like OpenACS based on a P2P-network and DHTs (cassandra, bigtable, ... coral, cfs, oceanstore .... pier), but this will be a different project, since some basic assumption about the data storage are different.

24: Re: OpenACS Performance Tests (response to 22)

Posted by Neophytos Demetriou on 01/14/09 11:31 AM

It is also bad if you need an expensive machine to be able to scale (read that as scale linearly). BTW, I would not classify Cassandra as a DHT in the same way that I would not classify PostgreSQL as a file system. The basic assumptions about the programming language and programming style were different for xowiki so I don't see your point about the project talk.

27: Re: OpenACS Performance Tests (response to 24)

Posted by Gustaf Neumann on 01/14/09 12:34 PM

The costs of processor per machine went down over the last yeas dramatically. ten years ago, a dual processor machine had the cost of a mainframe (i remember alta-vista vs. google discussions). today, a quadcore processor cost a few hundert dollar. i would expect a similar price for 32 cores in a few years from now.

Sure, DHT is just the basic workhorse, most of the other examples have also quite different properties than e.g. cassandra. What is the problem with the term "project"? xotcl-core and xowiki went out to be compatible with the acs data model. There is nothing wrong in having a project developing an acs package based on p2p technology. Many people will appreciate it. From the scope of the work, it is a project.

29: Re: OpenACS Performance Tests (response to 27)

Posted by Neophytos Demetriou on 01/14/09 12:50 PM

Yes, from the scope of the work, it is a project. It's the *different* project part that I did not get the point ;)

30: Re: OpenACS Performance Tests (response to 27)

Posted by Neophytos Demetriou on 01/14/09 01:06 PM

Furthermore, different file systems have different properties but that does not make them ACID-compliant databases.

Now, an interesting argument would be the one that makes the case for the use of ACID properties for content-oriented applications and semi-structured data.

23: Re: OpenACS Performance Tests (response to 19)

Posted by Brian Fenton on 01/14/09 11:24 AM

Hi everybody,

this is a very interesting discussion.

Gustaf, you said: "by using background delivery, maxthreads can be normally reduced." Can you explain a little bit more how you use background delivery? What's a good case for background delivery, and what isn't? Can you point to any examples?

many thanks
Brian

25: Re: OpenACS Performance Tests (response to 23)

Posted by Neophytos Demetriou on 01/14/09 11:44 AM

Hi Brian, if you 've got a lot of long-running threads after a while you can no longer serve any requests. Using background delivery you can delegate the serving of large files to a single thread that manages multiple channels (sockets).

You can do the same thing for resources (css, javascript, images, graphics) using a reverse proxy. The idea is that you should not keep threads loaded with the full blueprint of your installation for these tasks.

The way you use background delivery is similar to ns_returnfile, i.e.

ad_returnfile_background 200 [ns_guesstype ${filename}] ${filename}

Though, you need to make sure that you have tthread installed and the background delivery code that Gustaf wrote.

PS. The NaviServer project went the extra mile of serving requests by loading the procs dynamically --- only those required for a given request. My understanding is that the xorb package does something similar but at a higher level of abstraction.

28: Re: OpenACS Performance Tests (response to 25)

Posted by Gustaf Neumann on 01/14/09 12:59 PM

i agree with most of the statements. background delivery is more general than the reverse proxy, since one can do all permission checking etc. as usual in openacs, just the spooling is done via a single thread. This single thread can spool simultaneously several thousands of requests. You can't do that with threads (at least not without modifying the linux kernel). We run into scalability problems with pound (thread based) which were solved by moving to nginx (async delivery).

neophytos, i think, you meant tclthreads and not "tthread" (which sounds close to zoran's ttrace). The c-level code for background delivery is part of naviserver and aolserver 4.5 in the cvs head version. You find patches for aolserver 4.0 and plain 4.5 in the wiki.

xorb is very different from ttrace and is no replacement. ttrace can help to make threads thinner, at the cost of making introspection more complex/wrong. ttrace is not (at least was not) depending on any naviserver code, Zoran wrote it for aolserver. ttrace has XOTcl support and vice versa.

The primary argument in the discussion above: assume you have 10 connection threads configured and you want to server some larger files over some slower lines (600 MB files, delivery might take 2 minutes per request). If 10 people download these files via connection threads at the same time, the server won't be able to serve any requests for these 2 minutes, since tall connection threads are busy; incoming requests will be queued. When you have bgdelivery installed, the connection thread is only used for permission checking and file location on the disk (usually a few milliseconds). This leads to much better scalability.

31: Re: OpenACS Performance Tests (response to 28)

Posted by Neophytos Demetriou on 01/14/09 01:21 PM

about nginx: I did that as well but nginx has/had some trouble with the COMET stuff (as opposed to pound that works fine). Still, I'm using nginx.

about tclthreads: The package on TCL's file distribution area (on sf) is called thread-2.6.5 (not sure if that's what you mean). That package also includes ttrace, yes.

about ttrace: NaviServer has that integrated or makes use of it in order to provide tracing as an option. I haven't used that due to some problems that existed between that mechanism and XOTCL back when I tried (as soon as it was released). I remember sending you a message about that but I am not sure what's the current status.

about xorb: yes, I did not mean to imply that it is a replacement but it does serialize/deserialize code from a single thread and it also makes threads thinner, right? (just asking)

32: Re: OpenACS Performance Tests (response to 28)

Posted by Neophytos Demetriou on 01/14/09 01:25 PM

The c-level code for background delivery is part of 
naviserver and aolserver 4.5 in the cvs head version. You 
find patches for aolserver 4.0 and plain 4.5 in the wiki.

If you use NaviServer, it's already there.

33: Re: OpenACS Performance Tests (response to 32)

Posted by Neophytos Demetriou on 01/14/09 01:30 PM

Please disregard the comment "if you use NaviServer, it's already there". The only way to actually use it is to get the cvs head version :)

37: Re: OpenACS Performance Tests (response to 25)

Posted by Brian Fenton on 01/15/09 12:29 PM

Hi Neophytos!
Good to see you on the forums again.

Ok, I got it - it's a drop-in replacement for ns_returnfile. Very useful to know about.

thanks,
Brian

38: Re: OpenACS Performance Tests (response to 37)

Posted by Gustaf Neumann on 01/15/09 03:36 PM

Since this thread turns out out be a kind of tutorial, here some completion of the information: ad_returnfile_background is part of xotcl-core and has the following requirements

xotcl-core (sure, if it is defined there)
one needs either a "new" aolserver, or an "old" with the required patches (see discussion above and Boost...), and
one has to have libthread installed (tcl thread library, as described in the request-monitor wiki page)

If you have e.g. only (1), you will have as well a function named ad_returnfile_background, but this will just call ns_returnfile.

Hope this helps -gustaf neumann

34: Re: OpenACS Performance Tests (response to 23)

Posted by Rocael Hernández Rizzardini on 01/14/09 08:05 PM

Brian, more info is here: Boost your application performance...

26: Re: OpenACS Performance Tests (response to 19)

Posted by Eduardo Santos on 01/14/09 09:31 PM

Hi Gustaf,

Your considerations are very helpful, as they always are. I can have now a more general view about threads X requests behavior. I'll try to make comments based on what you've posted.

- When I said there was a XoTcl problem, I was talking about how the destroy process can be painful depending on the OS + Hardware set. But as you said better, it's not like a XoTcl problem, is more like an AOLServer problem. I guess we agree here.

- I guess I could finally understand what the threads are meant to do. If you have a portion of memory that includes all the procs in the system that can be shared, I can see how 150 is a too large number. Thank you very much for this explanation. I was seeing in production an error that says that maxconnections is exceeded, even with machine resources available. Your explanation shows me that there where a lot of threads in queue waiting for a large download to finish,and background delivery could be a solution for that. I'll think about more testing on that and I'll give everybody a feedback about it.

- About use PostgreSQL and AOLServer in the same box, I've seen in this post here that you run in learn@wu the IBM logical partitions. Is that right? In that case you have no degradation and the system allows you to do everything in the same box without I/O issues. If you consider that a socket connection is quite faster than a TCP/IP one, it's probably a good idea to use the same box. Otherwise, I guess the split is a better option.

That's it. Thank you for the help.

35: Re: OpenACS Performance Tests (response to 26)

Posted by Gustaf Neumann on 01/14/09 11:32 PM

Hi Eduardo,

even the mentioned problem (were many thread terminate at the same time) is gone with the actual head version of aolserver.

i am not sure about your question in the last paragraph. Yes, our production server of learn@wu (aolserver + postgres 8.2.*) runs in a single partition of the machine, using 6 of the 8 processors. We have still the same hardware as at the time of my posting with the figures (2005, earlier in the same forum thread), but we are now up to more than 2000 concurrent users, more than 60.000 learning resources, etc. and still below 0.4 seconds per view. The second partition of the server is used for development. The logical partition do not help to make anything faster and are more or less irrelevant in the discussion. We tested configurations with postgres or aolserver on other machines, but these setups were much slower.

As said earlier, if you run out of resources (e.g. cpu power), moving the database to a different (similar) server will certainly help. If one has already enough CPU and memory bandwith, putting the server on a different machine will slow things down.

36: Re: OpenACS Performance Tests (response to 35)

Posted by Eduardo Santos on 01/15/09 12:22 PM

Hi Gustaf,

Now I've got what you said. You see: IBM has an OS split mechanism called LPAR or logical partitions, as it's explained here. Using this feature is like if you have a VM, but without the software and hardware limitations that makes it impossible to split completely the I/O between all the machines you have. I thought that was the case for your production system, wich now you say it's not.

If you consider that a socket connection is quite faster than a TCP/IP connection, I can see that having both DB and AOLServer in the same box can be faster if you have enough available resources.

Concerning our tests, however, there's something showing up that can bring more issues to this matter. Linux Kernel has a parameter that sets the maximum amount of shared memory your applications can use in the system, wich is located at /proc/sys/kernel/shmmax. This parameter controls, for example, the amount of memory PostgreSQL can use in the shared buffers. In our test cases, we had the following set of parameters:

## AOLServer parameters ns_section ns/threads ns_param mutexmeter true ;# measure lock contention # The per-thread stack size must be a multiple of 8k for AOLServer to run under MacOS X ns_param stacksize [expr 1 * 8192 * 256]

ns_section ns/server/${server} ns_param maxconnections 1000 ;# Max connections to put on queue ns_param maxdropped 0 ns_param maxthreads 300 ;# Tune this to scale your server ns_param minthreads 200 ;# Tune this to scale your server ns_param threadtimeout 120 ;# Idle threads die at this rate ns_param globalstats false ;# Enable built-in statistics ns_param urlstats true ;# Enable URL statistics ns_param maxurlstats 1000 ;# Max number of URL's to do stats on

ns_section ns/db/pool/pool1 # ns_param maxidle 0 # ns_param maxopen 0 ns_param connections 200 ns_param verbose $debug ns_param extendedtableinfo true ns_param logsqlerrors $debug if { $database == "oracle" } { ns_param driver ora8 ns_param datasource {} ns_param user $db_name ns_param password $db_password } else { ns_param driver postgres ns_param datasource ${db_host}:${db_port}:${db_name} ns_param user $db_user ns_param password "" }

ns_section ns/db/pool/pool2 # ns_param maxidle 0 # ns_param maxopen 0 ns_param connections 150 ns_param verbose $debug ns_param extendedtableinfo true ns_param logsqlerrors $debug if { $database == "oracle" } { ns_param driver ora8 ns_param datasource {} ns_param user $db_name ns_param password $db_password } else { ns_param driver postgres ns_param datasource ${db_host}:${db_port}:${db_name} ns_param user $db_user ns_param password "" }

ns_section ns/db/pool/pool3 # ns_param maxidle 0 # ns_param maxopen 0 ns_param connections 150 ns_param verbose $debug ns_param extendedtableinfo true ns_param logsqlerrors $debug if { $database == "oracle" } { ns_param driver ora8 ns_param datasource {} ns_param user $db_name ns_param password $db_password } else { ns_param driver postgres ns_param datasource ${db_host}:${db_port}:${db_name} ns_param user $db_user ns_param password "" }

## Kernel Parameters: /proc/sys/kernel/shmall 2097152 /proc/sys/kernel/shmmax 2156978176

## PostgreSQL Parameters: max_connections = 1000 # (change requires restart) shared_buffers = 2000MB # min 128kB or max_connections*16kB work_mem = 1MB # min 64kB max_stack_depth = 5MB # min 100kB

With these values, the memory was corrupted and caused the database to crash under the stress test, wich means that PostgreSQL process did shut down all connections and killed the postmaster process, making the server to show a message and don't shut down or come back.

With all the information I have now, I can see this was probably caused by the huge amount of threads I've put. The new threads where trying to get more and more memory, and the OS just couldn't give it. As both AOLServer and PostgreSQL where theoretically using the same memory area, it caused the DB to corrupt and to kill the process. If they where split in different servers maybe this wouldn't happen, and it made us think that different servers are always the best solution. No matter what you do wrong and what happens to the server, the AOLServer crash will not kill the DB process and DB will also not kill AOLServer too. This could be the safer choice for the case.

39: Re: OpenACS Performance Tests (response to 36)

Posted by Gustaf Neumann on 01/15/09 03:57 PM

If you consider that a socket connection is quite faster than a TCP/IP connection ...

Well one has to be more precise. One has to distinguish between a local socket (IPC socket, unix domain socket) and a networking socket (e.g. TCP socket). Local communication is faster (at least it should be) than remote communication.

As discussed above, the minthreads/maxthreads values of the configuration are not reasonable (not sure what you were trying to simulate/measure/...). I have certain doubts, that the memory was "corrupted", but i have no problems to believe that several processes crashed on your system when it run out of memory. For running large applications one should become familiar with system monitors that measure resource consumptions. This way one can spot shortages and adjust the parameters in reasonable ways.