Forum OpenACS Q&A: ACS Hangs after "Running scheduled proc sec_sweep_sessions"

I am running ACS 4.2 Beta, Aolserver 3.1 and Oracle 8iR2 on Redhat
Linux 6.2 using Kernel 2.2.20.

At what appear to be random occurences my ACS server will stop
responding to web browser requests. During the most recent occurrence
the error.log shows the following error just before the system
stopped responding entirely.

[04/May/2002:17:44:59][783.6393863][-conn4033-] Notice: SQL():
    select content_item.get_live_revision(content_item.get_id
(:path,file_storage.get_root_folder(27510))) as version_id from dual

[04/May/2002:17:44:59][783.6393863][-conn4033-] Notice: bind
variable 'path' = 'randy_test_file/test/gifts.zip'

[04/May/2002:17:44:59][783.6393863][-conn4033-] Notice: SQL():

    select count(*)
      from dual

    where acs_permission.permission_p
(:object_id, :user_id, :privilege) = 't'

[04/May/2002:17:44:59][783.6393863][-conn4033-] Notice: bind
variable 'object_id' = '60158'

[04/May/2002:17:44:59][783.6393863][-conn4033-] Notice: bind
variable 'user_id' = '1464'

[04/May/2002:17:44:59][783.6393863][-conn4033-] Notice: bind
variable 'privilege' = 'read'

[04/May/2002:17:44:59][783.6393863][-conn4033-] Notice: SQL():

select mime_type

from  cr_revisions
where  revision_id = :version_id

[04/May/2002:17:44:59][783.6393863][-conn4033-] Notice: bind
variable 'version_id' = '60158'

[04/May/2002:17:44:59][783.6393863][-conn4033-] Notice: SQL():
select content
                            from  cr_revisions
                            where  revision_id = 60158

[04/May/2002:17:45:30][783.6393863][-conn4033-] Error:
ora8.c:4472:stream_write_lob error writing to connection.  incomplete
write of 0 out of 16384

[04/May/2002:17:56:20][783.6408209][-sched:5-] Notice: Running
scheduled proc acs_mail_process_queue...

[04/May/2002:17:59:25][783.6411284][-sched:6-] Notice: Running
scheduled proc acs_messaging_process_queue...

[04/May/2002:18:41:18][783.2051][-sched-] Notice: Running scheduled
proc sec_sweep_sessions...

The log file shows one of our website admins downloading a file from
the server. Part way through the download the web browser hung and
the server stopped responding. About an hour later "running scheduled
proc sec_sweep_sessions" appears to hang the system since this is the
last entry in the error log before the system no longer responds to
web browser requests. Typically there aren't any messages displayed
through the web browser, the website is not browseable. Sometimes
a "Server Busy" message will appear.

When I logged in to the system I performed a "ps ax" and noticed two
or three <defunct> oracle processes. I also noticed a very long
string of nsd8x processes.

To me, (and my knowledge is limited in the development area),
referring to the log file it looks as if the oracle driver may be
causing a problem and it would appear that perhaps nsd8x is having
difficulty binding to oracle. I apologize for not recording the
process list at the time.

Has anyone seen this type of thing happen before?

Thanks

Here is what I would check:

1.  old version of AOLserver - you should upgrade

2.  old version of Oracle driver that has a bug

3.  Setting of idle timeout for database connections - check the MaxOpen and MaxIdle settings in your nsd.tcl file and set to very high numbers

4.  size of your /tmp directory may be too small for the file being downloaded

These are the things I would check first.