Forum OpenACS Q&A: Selecting an empty (null) oracle-clob crashes AOLserver on Linux

Greetings,

We've been using AOLserver with Oracle in Unix/Sun platform; recently we've attempted to expand our servers with running AOLserver (3.4.2) on Intel/Linux (RH 6.2 & 7.1). The major issue has been frequent AOLserver crashes with no pattern, after a long and painful troubleshooting process, we found the cause of these crashes, and that is: selecting an empty clob using ACS API (or ns_db procs). Running the same query via SQL/Plus from this Linux box, works fine.

Note that our Oracle database server (8.1.6.3) is running on a Solaris box and the Oracle client on Linux is 8.1.7.0, the ns_ora (Oracle Driver) version is 2.6.

There are no errors coming to logs, only on the DB logs we see the following at the time of crashes:

ORA-07445: exception encountered: core dump [ttci2u()+2356] [SIGSEGV] [Address not mapped to object] [0] [] []

Any solution/work around or advice that can prevent the crash in this scenario will be a big breakthrough for us.

Thanks in advanced for your help.

Seena K.

Sounds like you're likely encountering an AOLserver Oracle driver bug, but version 2.6 of the driver is truly ancient. I suggest trying the latest Oracle driver from SourceForge CVS. If that doesn't fix the problem, you should probably submit a bug report on SourceForge. If it does fix the problem, you should probably bug the AOLserver maintainers to release a more recent version of the Oracle driver as a tarball for download.
Also, why are you still running Oracle 8.1.6.3 on your server? Is some dependency preventing you from upgrading? I know 8.1.7.4 fixed a long list of misc. bugs in 8.1.7.0, so presumably 8.1.6.3 had even more bugs to fix.

Also, I don't whether or not there are any potential incompatability issues between 8.1.6 and 8.1.7 versions of the client vs. server libraries, like you're using. If you haven't already you should probably look into that. But without knowing more, I'd assume that just upgrading everything to 8.1.7.4 would be a good idea.

(Then finally upgrade to 9.x along with the rest of us sometime later, of course...)

(Then finally upgrade to 9.x along with the rest of us sometime later, of course...)

At least after the memory leak in the Oracle 9i client is fixed...

See this thread on the AOLserver list.

This is a bug in the Oracle client library (OCI) and is known to Oracle.  We ran across it at Greenpeace in June, 2002 and Oracle confirmed that it is a bug that crops up when you run a client application on X86 Linux that talks to an Oracle server on Solaris.  We were not the first to report it and it fails with other clients, not just the db driver used by AOLserver.

However, what we were seeing was Oracle itself crashing on Solaris ... rather than the client.

You say the only error you get is in the DB log, which logs a core dump on the Solaris side.  Isn't this an indication that your Oracle server on Solaris is crashing, too?  Not the client in Linux (AOLserver)?  My guess is that this is the case and that you've got scripts set up to restart Oracle if it dies.

Our workaround was to set CLOBs to the empty string rather than empty_clob().

Supposedly this was being fixed in OCI - surely by 9i it must be fixed?

As to why SQL*Plus doesn't crash ... Oracle didn't say but clearly it's not using the same OCI library the rest of us use.

Apparently upgrading to the 9.2.0.3 OCI library fixed the memory leak problem...
Don,

Is this work-around in OpenACS 5.0?

Does this problem still exist with later versions of the software (Oracle 8.1.7.4, later version of the Oracle client for Linux)?

/Lars

Just so you know the latest version of Oracle that we tried prevented the AOLserver crash but still gives errors on Oracle side.

Also we learned that updating the empty clobs with *NULL* apparently solves the issue, below is a little script I came up which reads the list of tables with their CLOB column and then converts them. the key element is dbms_lob.getlength($ColumnName)=0 .

Hope this helps.

--Seena

#####################################################

set TABLES_COLUMNS_List [list \
\
[list ACTIVITY_GUIDE_INFO              JAVASCRIPT            ] \
[list BOOKS                            BOOK_DESCRIPTION      ] \
]

set db [ns_db gethandle]

foreach TableColumn $TABLES_COLUMNS_List {

    set TableName  [lindex $TableColumn 0]
    set ColumnName [lindex $TableColumn 1]

    set SQL {
        --
        SELECT count(rowid) FROM $TableName
        WHERE dbms_lob.getlength($ColumnName)=0
    }

    set UpdateSQL [subst { UPDATE $TableName SET $ColumnName=NULL WHERE dbms_lob.getlength($ColumnName)=0 } ]

    ns_db dml $db $UpdateSQL]

}

ns_db releasehandle $db

############################################################

We experianced the same issue with Aolserver running on linux connecting to a Solaris Oracle 8i server. We were running v2.6 of the nsoracle driver. Upgrading to v2.7 solved our problem.