Forum OpenACS Q&A: Should Keepalive be enabled in config.tcl?

In the current stock oacs /etc/config.tcl file "keepalive" is disabled.

Is this desired?

On numerous occasions lately AOLserver (3.3.1+ad13) has restarted on me in various situations; installing OpenACS, bulk uploading photos etc.

This is what I got now:

...

[27/Nov/2003:16:43:01][2447.16384][-main-] Warning: Multiple definition of util_memoize in /web/oacsdev/packages/acs-tcl/tcl/memoize-procs.tcl and /web/oacsdev/packages/acs-bootstrap-installer/bootstrap.tcl
[27/Nov/2003:16:43:01][2447.16384][-main-] Notice: Bootstrap: Installation is not complete - sourcing the installer.
[27/Nov/2003:16:43:01][2447.16384][-main-] Notice: Sourcing files for postload...
[27/Nov/2003:16:43:01][2447.16384][-main-] Notice: Done.
[27/Nov/2003:16:43:01][2447.16384][-main-] Notice: Executing initialization code blocks...
[27/Nov/2003:16:43:01][2447.16384][-main-] Notice: tcl: generating interp init script
[27/Nov/2003:16:43:01][2447.16384][-main-] Warning: keepalive: insufficient maxkeepalive 0: keepalive disabled
[27/Nov/2003:16:43:01][2447.16384][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 running
[27/Nov/2003:16:43:01][2447.16384][-main-] Notice: nsmain: security info: uid=1001, euid=1001, gid=1001, egid=1001
[27/Nov/2003:16:43:01][2447.32771][-sched-] Notice: sched: starting
[27/Nov/2003:16:43:01][2447.16384][-main-] Notice: serv: waiting for warmup
[27/Nov/2003:16:43:02][2447.16384][-main-] Notice: serv: warmed up
[27/Nov/2003:16:43:02][2447.16384][-main-] Notice: socks: idle
[27/Nov/2003:16:43:02][2447.16384][-main-] Notice: sched: idle
[27/Nov/2003:16:43:02][2447.16384][-main-] Notice: binder: listen(192.168.0.100,80) = 15
[27/Nov/2003:16:43:02][2447.16384][-main-] Notice: nssock: listening on 192.168.0.100:80
[27/Nov/2003:16:43:02][2447.131081][-nssock-] Notice: nssock: starting
[27/Nov/2003:16:43:02][2447.131081][-nssock-] Notice: nssock: accepting connections

...
That is, the installation of .LRN 2.0 got aborted while running.

(Note: I'm not even sure what "keepalive" is supposed to do.)

Have you all had any of these kind of "phenomenons" lately?

/Ola

Collapse
Posted by Tom Ayles on

I believe keepalive refers to an HTTP connection state. After requesting the initial page, I think most browsers keep the HTTP connection in the 'keepalive' state to fetch resources referenced from the page, e.g. images and stylesheets. This reduces the overhead of having to create new HTTP connections for each item on a page.

From the AOLServer configuration reference:

Maximum number of connections which can use HTTP keep-alive; should be equal to MaxConnections

I've followed this suggestion, and nothing bad has happened to me so far!

Collapse
Posted by Malte Sussdorff on
What you describe here has happened to us a couple of times. We basicly got rid of the problem by installing a keepalive server (which is completly different from the config parameter), that checks the sanity of my OACS installation and if down, restart it. Though this is only a treatment for the symptoms, it is working. Sadly though, without a considerable increase in the site-node-init.tcl (kudos to Timo for doing this), it is a PITA to restart a server (around 4 minutes with AIESEC).

Some other things I realized down that road:

- AOLserver4 crashes and needs to be restarted twice (first time it can't get hold of the ip:port)
- AOLserver4 wants to be killed, not terminated. Not sure if this is  in protest of the terminator going into politics or if I'm just to anxious (but hey, 4 minutes to shut down should be enough ....).

Collapse
Posted by Jonathan Ellis on
Seems to me that in high-volume sites letting clients hold onto their connections when they're not actually requesting a page (http keepalive) is undersireable.  I always set it to zero.
Collapse
Posted by Andrew Piskorski on
Jonathan, I don't know much about HTTP keepalive but I strongly suspect your conclusion is flawed, as the keepalive feature of HTTP is specifically intended to reduce server load, not increase it.

By your reasoning, I imagine the thing to do would probably be to enable HTTP keepalive, but set keepalivetimeout to a relatively short time, maybe 4 seconds. That way if the client really does have another HTTP request (for images, style sheets, etc.) to make immediately, it can pipe-line it through the same already open, kept-alive TCP/IP connection, but the connection will go away soon if the client is idle.

Collapse
Posted by Jonathan Ellis on
You could be right, but my impression was that keepalive is designed for situations where clients are expected to make many requests in a very short timeframe -- the example given is for image-heavy pages.  So I don't see how this will help when the client:request ratio is skewed the other way...
Collapse
Posted by Andrew Piskorski on
Maybe, but I wouldn't be so sure that a sites's traffic is skewed one way or another without first doing some log analysis.

For example, Jonathan, I bet your own www.carnageblender.com site, being an online game, has a high percentage of repeat multiple hits from the same user within just a few seconds of each other. In that case, perhaps AOLserver would be substantially more efficient with keepalive turned on. But perhaps not, and it's certainly possible that keepalive could make performance much worse, too... I don't see any way to know one way or another without empirical testing.

I googled briefly for HTTP keepalive performance tuning info. There were at least a few relevent July 2003 and Nov. 2002 threads on the AOLserver list, but mostly I didn't find anything useful. Neither rules of thumb, nor details on how to properly do log analysis and/or load testing to figure out optimum keepalive settings for a given site, web server, and load.

The (now old and outdated?) C10k page barely mentiond HTTP keepalive, but of course has links to all sorts of detailed server scalability info.

This is currently only of academic interest to me as I don't run a high traffic site. But some folks here do... So if you're reading this, please chime in with your HTTP keepalive tuning experience, as it would be best if we could give OpenACS adopters both a reasonable default keepalive setting in config.tcl and an explanation of why or when they might need to change it.

Collapse
Posted by Ola Hansson on
We don't seem to gain much clarity in the KeepAlive matter and it obviously doesn't have anything to do with the installation problems I saw, and still see even though I've upgraded to AOLserver 4 (cvs HEAD) and reinstalled PostgreSQL 7.3.4 from the debian package on "unstable" ...

I am getting lots and lots of errors (from time to time) when I install a fresh copy of .LRN from the "dotlrn-2-0" and "oacs-5-0" branches.

Am I really the only one to get these errors? (The extra backslashes is due to some poor cut and paste exercises ... sorry)


[03/Dec/2003:15:18:12][1202.16384][-main-] Notice: nsmain: AOLserver/4.1 starti\ng

...

[03/Dec/2003:15:18:54][1202.65541][-conn:oacsdev::1] Error: apm_package_install\: Error installing Reference Data version 5.0.0a5: psql:acs-reference-create.sq\l:16: WARNING:  Error occurred while executing PL/pgSQL function acs_priv_hier_\ins_del_tr
psql:acs-reference-create.sql:16: WARNING:  line 61 at SQL statement
psql:acs-reference-create.sql:16: ERROR:  cache lookup failed for opclass 2003
                                                                                
psql:acs-reference-create.sql:16: WARNING:  Error occurred while executing PL/p\gSQL function acs_priv_hier_ins_del_tr
psql:acs-reference-create.sql:16: WARNING:  line 61 at SQL statement
psql:acs-reference-create.sql:16: ERROR:  cache lookup failed for opclass 2003
                                                                                
    invoked from within
"db_source_sql_file -callback $callback $path/$file_path"
    (procedure "apm_package_install_data_model" line 32)
    invoked from within
"apm_package_install_data_model -callback $callback -data_model_files $data_mod\el_files $spec_file_path"
    invoked from within
"if { $load_data_model_p } {
            apm_package_install_data_model -callback $callback -data_model_file\s $data_model_files $spec_file_path
        }"
    ("uplevel" body line 42)
    invoked from within
"uplevel $body "
[03/Dec/2003:15:18:54][1202.65541][-conn:oacsdev::1] Error: 

Failed to instal\l Reference Data, version 5.0.0a5. The following error was generated:

psql:acs-reference-create.sql:16: WARNING: Error occurred while executing PL/p\gSQL function acs_priv_hier_ins_del_tr psql:acs-reference-create.sql:16: WARNING: line 61 at SQL statement psql:acs-reference-create.sql:16: ERROR: cache lookup failed for opclass 2003 ... [03/Dec/2003:15:19:02][1202.65541][-conn:oacsdev::1] Error: apm_package_install\: Error installing Content Repository version 5.0.0b4: psql:content-image.sql:4\7: WARNING: Error occurred while executing PL/pgSQL function content_type__ref\resh_view psql:content-image.sql:47: WARNING: line 22 at for over select rows psql:content-image.sql:47: ERROR: cache lookup failed for opclass 2002 psql:content-image.sql:58: ERROR: current transaction is aborted, queries igno\red until end of transaction block psql:content-image.sql:68: ERROR: cr_content_mime_map_ctyp_fk referential inte\grity violation - key referenced from cr_content_mime_type_map not found in acs\_object_types psql:content-image.sql:73: ERROR: current transaction is aborted, queries igno\red until end of transaction block psql:content-create.sql:1365: ERROR: cr_type_template_map_typ_fk referential i\ntegrity violation - key referenced from cr_type_template_map not found in acs_\object_types psql:content-image.sql:47: WARNING: Error occurred while executing PL/pgSQL fu\nction content_type__refresh_view psql:content-image.sql:47: WARNING: line 22 at for over select rows psql:content-image.sql:47: ERROR: cache lookup failed for opclass 2002 psql:content-image.sql:58: ERROR: current transaction is aborted, queries igno\red until end of transaction block psql:content-image.sql:68: ERROR: cr_content_mime_map_ctyp_fk referential inte\grity violation - key referenced from cr_content_mime_type_map not found in acs\_object_types psql:content-image.sql:73: ERROR: current transaction is aborted, queries igno\red until end of transaction block psql:content-create.sql:1365: ERROR: cr_type_template_map_typ_fk referential i\ntegrity violation - key referenced from cr_type_template_map not found in acs_\object_types invoked from within "db_source_sql_file -callback $callback $path/$file_path" (procedure "apm_package_install_data_model" line 32) invoked from within "apm_package_install_data_model -callback $callback -data_model_files $data_mod\el_files $spec_file_path" invoked from within "if { $load_data_model_p } { apm_package_install_data_model -callback $callback -data_model_file\s $data_model_files $spec_file_path }" ("uplevel" body line 42) invoked from within "uplevel $body " [03/Dec/2003:15:19:02][1202.65541][-conn:oacsdev::1] Error:

Failed to instal\l Content Repository, version 5.0.0b4. The following error was generated:

psql:content-image.sql:47: WARNING: Error occurred while executing PL/pgSQL fu\nction content_type__refresh_view psql:content-image.sql:47: WARNING: line 22 at for over select rows psql:content-image.sql:47: ERROR: cache lookup failed for opclass 2002 psql:content-image.sql:58: ERROR: current transaction is aborted, queries igno\red until end of transaction block psql:content-image.sql:68: ERROR: cr_content_mime_map_ctyp_fk referential inte\grity violation - key referenced from cr_content_mime_type_map not found in acs\_object_types psql:content-image.sql:73: ERROR: current transaction is aborted, queries igno\red until end of transaction block psql:content-create.sql:1365: ERROR: cr_type_template_map_typ_fk referential i\ntegrity violation - key referenced from cr_type_template_map not found in acs_\object_types ... [03/Dec/2003:15:19:02][1202.65541][-conn:oacsdev::1] Error: apm_package_install\: Error installing Reference Data - Timezone version 5.0.0a5: psql:/tmp/psql-co\pyfile-PYr4ra:1: ERROR: GetTypeElement: Cache lookup of type 23 failed psql:/tmp/psql-copyfile-PYr4ra:1: lost synchronization with server, resetting c\onnection psql:/tmp/psql-copyfile-PYr4ra:1: ERROR: GetTypeElement: Cache lookup of type \23 failed psql:/tmp/psql-copyfile-PYr4ra:1: lost synchronization with server, resetting c\onnection invoked from within "db_load_sql_data -callback $callback $path/$file_path" (procedure "apm_package_install_data_model" line 52) invoked from within "apm_package_install_data_model -callback $callback -data_model_files $data_mod\el_files $spec_file_path" invoked from within "if { $load_data_model_p } { apm_package_install_data_model -callback $callback -data_model_file\s $data_model_files $spec_file_path }" ("uplevel" body line 42) invoked from within "uplevel $body " [03/Dec/2003:15:19:02][1202.65541][-conn:oacsdev::1] Error:

Failed to instal\l Reference Data - Timezone, version 5.0.0a5. The following error was generate\d:

psql:/tmp/psql-copyfile-PYr4ra:1: ERROR: GetTypeElement: Cache lookup of type \23 failed psql:/tmp/psql-copyfile-PYr4ra:1: lost synchronization with server, resetting c\onnection ... [03/Dec/2003:15:19:10][1202.65541][-conn:oacsdev::1] Error: Error sourcing /var\/lib/aolserver/oacsdev/packages/acs-bootstrap-installer/installer/install.tcl: Selection did not return a value, and no default was provided while executing "db_string pretty_name_from_key {select pretty_name from apm_enabled_pack\age_versions ..." (procedure "apm_package_instance_new" line 4) invoked from within "apm_package_instance_new -package_id $package_id -package_key $package_key \-instance_name $package_name -context_id $context_id" (procedure "site_node::instantiate_and_mount" line 37) invoked from within "site_node::instantiate_and_mount -package_key acs-content-repository" (procedure "apm_mount_core_packages" line 19) invoked from within "apm_mount_core_packages" (procedure "install_do_packages_install" line 48) invoked from within "install_do_packages_install" (file "/var/lib/aolserver/oacsdev/packages/acs-bootstrap-installer/installe\r/install.tcl" line 52) invoked from within "source $__file " EOF
/Ola
Collapse
Posted by Ola Hansson on
Duh! I forgot to say:

But thanks for the interesting KeepAlive discussion anyway!

/Ola

Collapse
Posted by Jeff Davis on
it sure looks like a broken pg install to me. In particular it seems to me like this could only be an internal pg problem...
psql:acs-reference-create.sql:16: WARNING:  Error occurred while executing PL/p\gSQL function acs_priv_hier_ins_del_tr
psql:acs-reference-create.sql:16: WARNING:  line 61 at SQL statement
psql:acs-reference-create.sql:16: ERROR:  cache lookup failed for opclass 2003
acs_priv_hier_ins_del_tr has not changed since May.

Also this looks quite suspicious as well:

Error: apm_package_install: Error installing Reference Data - Timezone version 5.0.0a5:
psql:/tmp/psql-copyfile-PYr4ra:1: ERROR:  GetTypeElement: Cache lookup of type 23 failed
psql:/tmp/psql-copyfile-PYr4ra:1: lost synchronization with server, resetting connection
                                                                                
since this is just issuing sql commands fed in by psql.
Collapse
Posted by Jeff Davis on
As a datapoint, I just did an install of a clean oacs-5-0 checkout on postgres w/o any problems.