Forum OpenACS Development: Re: OpenACS Performance Tests

Collapse
Posted by Eduardo Santos on
Hi Gustaf,

Now I've got what you said. You see: IBM has an OS split mechanism called LPAR or logical partitions, as it's explained here. Using this feature is like if you have a VM, but without the software and hardware limitations that makes it impossible to split completely the I/O between all the machines you have. I thought that was the case for your production system, wich now you say it's not.

If you consider that a socket connection is quite faster than a TCP/IP connection, I can see that having both DB and AOLServer in the same box can be faster if you have enough available resources.

Concerning our tests, however, there's something showing up that can bring more issues to this matter. Linux Kernel has a parameter that sets the maximum amount of shared memory your applications can use in the system, wich is located at /proc/sys/kernel/shmmax. This parameter controls, for example, the amount of memory PostgreSQL can use in the shared buffers. In our test cases, we had the following set of parameters:

## AOLServer parameters
ns_section ns/threads
ns_param mutexmeter true ;# measure lock contention
# The per-thread stack size must be a multiple of 8k for AOLServer to run under MacOS X
ns_param stacksize [expr 1 * 8192 * 256]

ns_section ns/server/${server}
ns_param maxconnections 1000 ;# Max connections to put on queue
ns_param maxdropped 0
ns_param maxthreads 300 ;# Tune this to scale your server
ns_param minthreads 200 ;# Tune this to scale your server
ns_param threadtimeout 120 ;# Idle threads die at this rate
ns_param globalstats false ;# Enable built-in statistics
ns_param urlstats true ;# Enable URL statistics
ns_param maxurlstats 1000 ;# Max number of URL's to do stats on

ns_section ns/db/pool/pool1
# ns_param maxidle 0
# ns_param maxopen 0
ns_param connections 200
ns_param verbose $debug
ns_param extendedtableinfo true
ns_param logsqlerrors $debug
if { $database == "oracle" } {
ns_param driver ora8
ns_param datasource {}
ns_param user $db_name
ns_param password $db_password
} else {
ns_param driver postgres
ns_param datasource ${db_host}:${db_port}:${db_name}
ns_param user $db_user
ns_param password ""
}

ns_section ns/db/pool/pool2
# ns_param maxidle 0
# ns_param maxopen 0
ns_param connections 150
ns_param verbose $debug
ns_param extendedtableinfo true
ns_param logsqlerrors $debug
if { $database == "oracle" } {
ns_param driver ora8
ns_param datasource {}
ns_param user $db_name
ns_param password $db_password
} else {
ns_param driver postgres
ns_param datasource ${db_host}:${db_port}:${db_name}
ns_param user $db_user
ns_param password ""
}

ns_section ns/db/pool/pool3
# ns_param maxidle 0
# ns_param maxopen 0
ns_param connections 150
ns_param verbose $debug
ns_param extendedtableinfo true
ns_param logsqlerrors $debug
if { $database == "oracle" } {
ns_param driver ora8
ns_param datasource {}
ns_param user $db_name
ns_param password $db_password
} else {
ns_param driver postgres
ns_param datasource ${db_host}:${db_port}:${db_name}
ns_param user $db_user
ns_param password ""
}

## Kernel Parameters:
/proc/sys/kernel/shmall 2097152
/proc/sys/kernel/shmmax 2156978176

## PostgreSQL Parameters:
max_connections = 1000 # (change requires restart)
shared_buffers = 2000MB # min 128kB or max_connections*16kB
work_mem = 1MB # min 64kB
max_stack_depth = 5MB # min 100kB

With these values, the memory was corrupted and caused the database to crash under the stress test, wich means that PostgreSQL process did shut down all connections and killed the postmaster process, making the server to show a message and don't shut down or come back.

With all the information I have now, I can see this was probably caused by the huge amount of threads I've put. The new threads where trying to get more and more memory, and the OS just couldn't give it. As both AOLServer and PostgreSQL where theoretically using the same memory area, it caused the DB to corrupt and to kill the process. If they where split in different servers maybe this wouldn't happen, and it made us think that different servers are always the best solution. No matter what you do wrong and what happens to the server, the AOLServer crash will not kill the DB process and DB will also not kill AOLServer too. This could be the safer choice for the case.

Collapse
Posted by Gustaf Neumann on

If you consider that a socket connection is quite faster than a TCP/IP connection ...
Well one has to be more precise. One has to distinguish between a local socket (IPC socket, unix domain socket) and a networking socket (e.g. TCP socket). Local communication is faster (at least it should be) than remote communication.

As discussed above, the minthreads/maxthreads values of the configuration are not reasonable (not sure what you were trying to simulate/measure/...). I have certain doubts, that the memory was "corrupted", but i have no problems to believe that several processes crashed on your system when it run out of memory. For running large applications one should become familiar with system monitors that measure resource consumptions. This way one can spot shortages and adjust the parameters in reasonable ways.