Forum OpenACS Development: Problem with db_transaction?

Collapse
Posted by Nis Jørgensen on
One of our servers (AOL 3.3+ad13, OpenACS 5.0.2 (approximately), postgres 7.4.2 ) occasionally hangs, not accepting new connections (and not running scheduled procs).

Restarting the server gets things back to normal.

At least two out of the last four times this happened, the last code being executed has been:

ns_log notice "mailing lists: try to record bounced mail"
catch {
if {[regexp {X-Field: ([0-9]*)-([0-9]*)} $body match user_id job_id]} {
db_transaction {
ns_log notice "mailing lists: bounce recorded for job $job_id and user $user_id"
db_dml record_bounce_in_log {}
if {[db_resultrows]} {
db_dml increment_bounce_count {}
}
}
}
}

The first ns_log statement is executed, the second one (inside the db_transaction) isn't. The regexp matches.

Can anyone think of any reason db_transaction should fail to initialize? Unable to allocate a db_handle? Or something else related to postgresql and transactions?

/Nis

Collapse
Posted by Nis Jørgensen on
OK - found what seems to be the problem. It seems a long-running query was locking up a db_handle (and a connection thread) every time it was run. Since I have more threads than handles, in the end db_handles run out.

Is there any way to monitor the number of handles in use (from the aolserver/driver point of view)?

And: Is it normal for queries to keep running in the postgres server after I restart AOLServer?

Collapse
Posted by Don Baccus on
Ahh, exhaustion of available db handles was going to be my guess ... cool.

Stopping AOLserver doesn't send a SIGTERM signal to the PG backend (or Oracle, for that matter, if you're using that RDMS). Should it? I don't think the answer is obviously "yes", just because you stop AOLserver doesn't mean that you don't want to record (say) the finalization of a credit card transaction that just cleared before you decided to restart AOLserver ... normally I think you'd want transactions that have started to run to completion.

As far as runaway queries on a handle, then ... if you have minimal stats enabled in PG you can query and find out exactly what query has run away with your backend and AFTER having checked it out manually, kill -TERM your backend or let it run to completion, whichever you prefer.

I think manual administration of the DB in this instance is better than having the AOLserver database driver "guess" for you, no?

Collapse
Posted by Andrew Piskorski on
Are these PG queries available and made easy to use by just clikcing on a web page in the OpenACS Monitoring package yet? If not, adding that would be an obviously valuable improvement.
Collapse
Posted by Don Baccus on
No, they aren't in monitoring, it would be nice if someone added pages to do this.
Collapse
Posted by Nis Jørgensen on
I certainly wouldn't be sending any (OS) signals to the db server when shutting down AOLServer. However I would expect the database driver to terminate it's connections to the dbms, which I would in turn expect to cause the running queries to terminate (like pressing Ctrl-C in a psql terminal).

From reading the docs here
http://www.postgresql.org/docs/7.4/interactive/protocol-flow.html#AEN52769 about cancellation and termination , it seems the problem is that postgres only checks for termination (normal or abnormal) BETWEEN queries.

Since this should normally not be happening, I do not think the behaviour is unreasonable. A cancellation interface could come in handy in other cases though (such as the ability to kill long running queries).

/Nis

Collapse
Posted by Don Baccus on
Ahem.

"I certainly wouldn't be sending any (OS) signals to the db server when shutting down AOLServer."

OK

"However I would expect the database driver to terminate it's connections to the dbms, which I would in turn expect to cause the running queries to terminate (like pressing Ctrl-C in a psql terminal)."

Sending an OS signal (SIGTERM) to the backend process is HOW psql terminates a query when you press Ctrl-C ...

And you're misreading the doc you cite, the backend isn't actively checking for a termination message, a new backend process is sent a termination message and it in turn bops the query-running process with a SIGTERM signal. Signals are CAUGHT, not POLLED ... all the doc sez is that the backend, while running a query, doesn't POLL to see if a terminate message has been sent but rather responds to a signal (which takes no overhead since the OS takes care of calling the catching function directly when it processes the signal).

Collapse
Posted by Don Baccus on
Sorry, to be more clear, it doesn't POLL while processing a query, but when it DOES catch a SIGTERM it will stop the current query from executing and return an error to the process which initiated it.

Bottom line, though, is that you stop a query by either issuing a SIGTERM yourself or by opening up another backend and have it do so for you.

Collapse
Posted by Nis Jørgensen on
Sending an OS signal (SIGTERM) to the backend process is HOW psql terminates a query when you press Ctrl-C ...

Doesn't look like it to me. As far as I can read from the source, psql sends a cancellation message to the server in another connection - which then probably sends a SIGTERM to the process.

And you're misreading the doc you cite, the backend isn't actively checking for a termination message, a new backend process is sent a termination message and it in turn bops the query-running process with a SIGTERM signal. Signals are CAUGHT, not POLLED ... all the doc sez is that the backend, while running a query, doesn't POLL to see if a terminate message has been sent but rather responds to a signal (which takes no overhead since the OS takes care of calling the catching function directly when it processes the signal).
I believe you misunderstood my use of the word "termination". I used it in the context of the postgresql protocol document, not OS signals.

As I read the doc:

  • Cancellation of a query happens by the client sending a message on another connection, as described above.
  • Termination of a connection happens either because of an explicit message sent from the client on the connection, OR when the backend discovers that the connection has been terminated by the client. I don't know how the last part happens, but apparently it is NOT happening during query execution, or at least it didn't for a couple of days on my production server.
  • /Nis

    PS: Can we please have the option back to reply to a reply?

    Collapse
    Posted by Don Baccus on
    It makes no sense to reply to a previous reply in a linear discussion forum, like this one.

    Your description is exactly what I said - a backend executing a query is stopped by issuing a SIGTERM to it. This can be done in two ways: by issuing it yourself or starting up a new backend and asking it to do a SIGTERM for you.

    As you point out (and as I pointed out) PSQL takes the second approach. Bottom line, though, is that you need to get a SIGTERM issued to the backend process to get it to stop.

    Collapse
    Posted by Andrew Piskorski on
    Don, yes it does make perfect sense to reply to a specific post in the openacs.org Forums. The Forums package keeps internal track of the tree reply structure of posts. And when the Forums package displays one single individual post, it also displays all children replies below it, tree-wise. Examples: one, two.

    Of course, Nis, as this post demonstrates, if you display a post singly the "Post a Reply" then replies to that specific post rather than the first post in the thread, maintaining the threaded/tree structure. This seems like a reasonable compromise between specificity for power users and UI simplicity for newbies.

    Collapse
    Posted by Jeff Davis on
    I do miss the ability to reply to a specific post as now I am perpetually having to open the thread in another window in order to be able to see what I am responding to.

    I think if we added a "reply with quote" link it would necessarily be per post, it's a feature we should have, and it would get me what I really want which is the post I am really replying to in the same window with my response).