Forum OpenACS Q&A: ns_ora, memory leak, server dies

Collapse
Posted by Nis Jørgensen on
I am having a problem with my production server. The server dies in the middle of a request (and automatically restart)

Log file looks like this

[18/Apr/2004:18:42:08][14612.10251][-conn7-] Error: ora8.c:4472:stream_write_lob error writing to connection.  incomplete write of 0 out of 16384

[info from other connections left out]

[18/Apr/2004:18:45:17][14612.10251][-conn7-] Error: ora8.c:1062:ora_open_db: error in `OCIServerAttach ()': ORA-12535: TNS:operation timed out
SQL: [nil]
[18/Apr/2004:18:45:17][14612.10251][-conn7-] Notice: RP (192368.736 ms): error in rp_handler: serving GET /multimedia/download/1/48053/1/Patent-WebVersion01.rm
        ad_url "/multimedia/download/1/48053/1/Patent-WebVersion01.rm" maps to file "/data/web/planet/packages/gp-multimedia/www/download/index.vuh"
errmsg is can't write to connection for writing.  received error Connection reset by peer
[18/Apr/2004:18:45:17][17393.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting

(Last message is server starting after dying).

AOLServer doesn't always die at this point. When it doesn't there is another error message, like this:

[21/Apr/2004:10:48:13][10906.9226][-conn6-] Error: GET http://www.greenpeace.org/multimedia/download/1/142608/2/NoWar_VF_WinMe
dia_v1.wmv?
referred by "http://www.greenpeace.org/france_fr/multimedia/media-view?type=gp_video&start_row=3&campaign_id="
can't write to connection for writing.  received error Connection reset by peer
    while executing
"ns_ora write_blob nsdb0 {
                select content2
                from gp_media
                where gp_media_id = 142608}"
    ("uplevel" body line 1)
    invoked from within
[further stacktrace removed]

It seems like the problem occurs when the server is low on memory. And the available memory seems to decrease (rapidly) over time ( 50 MB/hour)

So this seems likely to be a combination of a memory leak and an ns_ora problem (though which is cause and which is effect is not clear to me ATM)

Any help would be muchly appreciated.

Nis
Greenpeace.

Collapse
Posted by Samer Abukhait on
Server is not able to write to the Database

is oracle on the same server as aol ?

is the web user able to connect to oracle and write ??

test everything separated from aol

as the web user
sqlplus dbuser
create and write something.

if everything is fine, then you might accuse memory or CPU, your judge will be no body but 'top'

Collapse
Posted by Nis Jørgensen on
  • Oracle is on a different server.
  • Problem happens when _reading_ a blob from the db (the stream_write_lob writes a lob from the db to the connection, if I am not mistaken (which I might very well be)
  • And problem is not reproducible (but frequent)
  • Collapse
    Posted by Jeff Davis on
    which version of the driver and which version of oracle are
    you using?
    Collapse
    Posted by Samer Abukhait on
    I guess the browsing client's connection might be a reason for this type of problem... Don't know exactly.

    As oracle resides on a different server, you have to be sure that there is a connection compatibility between the driver the oracle server.

    Collapse
    Posted by Nis Jørgensen on
    AOLserver/3.3.1+ad13
    Oracle 8.1.7
    ArsDigita Oracle Driver version 2.5
    Collapse
    Posted by Nis Jørgensen on
    Just saw this in nsoracle/BUGS:

    <blockquote>Known Bugs
    1. The cleanup after errors after stream_write_lob is very heavy-handed, and
    should be fixed to use a better cleanup after interrupting a multipart LOB get.
    </blockquote>

    Which seems to be relevant for this case

    Collapse
    Posted by Andrew Piskorski on
    Nis, that version of the Oracle driver is ancient. You should be using nsoracle 2.7 from SourceForge.
    Collapse
    Posted by Nis Jørgensen on
    I will look at upgrading my oracle driver. I have a few other things that I will try first (one thing at a tiem when debugging).
    Collapse
    Posted by Nis Jørgensen on
    OK. Driver upgraded, problem persists. Any other ideas?
    Collapse
    Posted by Guan Yang on
    I'm also experiencing this. Server is Oracle 9i (9.2.something), nsoracle-2.7, aolserver-4.0.2.

    Guan

    Collapse
    Posted by Barry Books on
    What's the code look like. I think there is a bug if you do somthing like
    db_dml in { update t set blob_col = :1 }
    
    It will work if :1 is small enough. If it's too big it causes a crash.
    Collapse
    Posted by Guan Yang on
    According to this post on the aolserver mailing list, upgrading to 9.2.0.3/4 client should fix the issue. It appears nsoracle is using some old OCI calls (rather than the new spiffy 9i calls, including one which allows out-of-order fetches from result sets). The old calls are still supported, but apparently not as thoroughly tested by Oracle.
    Collapse
    Posted by Guan Yang on
    I upgraded the client to 9.2.0.5.0 and the leak seems to be gone.