Forum OpenACS Development: sec_sweep_sessions can hang forever if database breaks

Collapse
Posted by Andrew Piskorski on
I just noticed that "excessive time taken by proc 7 (10679 seconds)" in my log above. 10679 seconds is about 3 hours. So that was the actual problem - due to the database access failure, this little bit of code apparently hung for the whole 3 hours, completely tieing up the AOLserver scheduler thread:
ad_proc -private sec_sweep_sessions {} { 
    set expires [expr {[ns_time] - [sec_session_lifetime]}] 
    db_dml sessions_sweep {}  
    db_release_unused_handles 
} 
And the SQL that proc ran is also very simple, just:
delete from sec_session_properties 
where last_hit < :expires 

So at least on this Oracle 10g (10.2.0.2.0) server, under certain Oracle-wide error conditions a simple delete statement can hang forever, and will never time out. That makes some degree of sense. The delete statement needs to generate rollback, and the failure which froze up Oracle for those 3 hours was running out of either online or archived redo logs space, which is intimately related to rollback.

Somewhat oddly, the delete statement above, which normally takes less than 1 second, started at 13:45, and ran for 37 minutes before another thread triggered the very first Oracle error in the AOLserver log. There was also a lot of other activity in the log during those 37 minutes, all which worked fine without errors. Yet the delete was hanging all that time, which presumably means that Oracle was starting to have trouble well before it finally triggered client errors.

Collapse
Posted by Steve Manning on
Yes I was about to point that out that "excessive time" entry as its exactly what happened to us with two scheduled events sharing the main thread.

Its actually easy to setup a schedule in its own thread you just use the -thread switch thus:

ad_schedule_proc -thread t -debug t 600 backoffice::export::send_promos
or
ad_schedule_proc -thread t -debug t -schedule_proc ns_schedule_daily {04 30} backoffice::export::send_stock_status

The downside I believe is that it sets up and tears down its own interp - but thats not usually too much of a problem.

- Steve

Collapse
Posted by Andrew Piskorski on
Steve, yes, the -thread option to ns_schedule_proc (which ad_schedule_proc just passes along) is what I referred to above. If I remember correctly, that starts up a new thread for every individual run of the job, and then kills it. If the main scheduler thread is locked up, will the thread for the new job ever get spawned? I'd need to test it, but I suspect the answer is no, it will not.