Forum OpenACS Q&A: Installation of ACS 4.2 and Oracle 9i on Linux

Hi all,

I've been trying to install ACS 4.2 and Oracle 9i on a Red Hat 7.2
box and being a newbie to ACS and Oracle I've finally reached a
problem that I can't find an answer too.

I've managed to install the Kernel and Core modules but when the
install tries to create the site-wide administrator I get an "ORA-
03113: end-of-file on communication channel" error.

Here are is a partial copy of the error and part of the Oracle trace
file.  Any pointers on where to go next would be very much
appreciated.

FIRST PART OF ERROR:

ora8.c:3569:ora_tcl_command: error in `OCIStmtExecute ()': ORA-03113:
end-of-file on communication channel

SQL:
    begin
        :1 := acs.add_user(user_id => :user_id,
             email => :email,
             url => :url,
             first_names => :first_names,
             last_name => :last_name,
             password => :hashed_password,
                    salt => :salt,
                    password_question => :password_question,
                    password_answer => :password_answer,
                    creation_ip => :peeraddr,
                    email_verified_p => :email_verified_p,
                    member_state => :member_state);
        end;

    while executing
"ns_ora exec_plsql_bind nsdb0 {
...etc.

ORA TRACE FILE:

*** 2002-01-08 18:41:52.589
*** SESSION ID:(7.33) 2002-01-08 18:41:52.570
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to
object), addr: 0x90, PC: [0x894994b, kkeidc()+299]
*** 2002-01-08 18:41:53.158
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [kkeidc()+299] [SIGSEGV]
[Address not mapped to object] [0x90] [] []
Current SQL statement for this session:
select /*+ FIRST_ROWS*/ constraint_id, constraint_name
      from rel_constraints_violated_one
      where rel_id = :b1
        and rownum = 1
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x569fd9ac        72  package body INTRANET.REL_CONSTRAINT
0x569fd144        9  INTRANET.MEMBERSHIP_RELS_IN_TR
0x56a2a03c        26  package body INTRANET.MEMBERSHIP_REL
0x56bd5d40        35  package body INTRANET.ACS
0x56b0e38c        3  anonymous block
----- Call Stack Trace -----
calling              call    entry                argument values in
hex
location            type    point                (? means dubious
value)
-------------------- -------- -------------------- -------------------
---------
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to
object), addr: 0x0, PC: [0x40480760, strcpy()+64]
*** 2002-01-08 18:41:53.607
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [strcpy()+64] [SIGSEGV]
[Address not mapped to object] [0x0] [] []
ORA-07445: exception encountered: core dump [kkeidc()+299] [SIGSEGV]
[Address not mapped to object] [0x90] [] []

Thanks,

Rob

Whoa Rob, that's scary.  What OS?  Do you have access to Oracle MetaLink?
We don't currently support 9i, will with our second release cycle.  We do support 8.1.6 and 8.1.7, and 8.1.7 includes the INSO filters for InterMedia in the Linux version (finally!).

So there's no compelling need to use 9i at the moment.  The one problem I know about is that "delete" is now a reserved word and we have a ton of object types that define a procedure with that name, and a ton of references to them.  So we decided to hold off.

However ... I know at least one person, after experimenting with editing the package definitions for the kernel, who managed to get the kernel installed (he didn't have time to edit all the references not to mention all the other packages in the toolkit).

So ... something else is going on.

What's your stacksize parameter set to in your nsd.tcl file?  The driver needs a lot of stack space ...

I have installed 9i and it works fine. OACS will not work though as AD decided to use reserved words for functions in pl/sql (against some of us arguing that this was not correct).

this kind of error is usually a symptom of kernel not well configured, you should double check that the ipc parameters are well sized (maximum shared memory size, message queue parameter, ...). The Oracle installation on Linux should be quite similar than the one on Solaris.
DELETE has always been a reserved word in PL/SQL; also, NEW has been a reserved word since Oracle8.0 and there are many packages in ACS4.x that (re)define a procedure/function NEW. Now, PL/SQL as a block-structured programming language should obey the rules of scope and allow e.g. a procedure named DELETE inside a package named ACS_CONTENT. At least that is what Oracle 8.1.6 does. (Although I personally consider it a bad style to redefine reserved words in a programming language).

Since I don't have access to Oracle9i, could somebody (Robert?) run the following test and let us know what happens? Oracle8.1.6 for SUN Solaris allows the re-definition of the DELETE and NEW reserved words inside a user-defined package.


-- A package to test if the running Oracle version allows redefinition of the
-- NEW and DELETE reserved words
create or replace package test1 as
  function new (
    db_ver out VARCHAR2, 
    db_compatible out VARCHAR2
  ) return VARCHAR2;

  procedure delete(
    db_ver out VARCHAR2,
    db_compatible out VARCHAR2
  );
end test1;
/
show errors;
create or replace package body test1 as
  function new(
    db_ver out VARCHAR2, 
    db_compatible out VARCHAR2
  ) return VARCHAR2
  is 
  begin
    dbms_utility.db_version(db_ver,db_compatible);
    return 'User-defined function NEW allowed in Version '||db_ver;
  end new;
  
  procedure delete (
    db_ver out VARCHAR2,
    db_compatible out VARCHAR2
  )
  is
  begin
    dbms_utility.db_version(db_ver,db_compatible);
  end delete;    
end test1;
/
show errors;
set serveroutput on format wrapped size 1000000
declare 
 db_ver VARCHAR2(12);
 db_compatible VARCHAR2(12);
begin
test1.delete(db_ver,db_compatible);
dbms_output.put_line('User-defined procedure DELETE allowed in Version '||db_ver);
dbms_output.put_line(test1.new(db_ver,db_compatible));
end;
/


Reserved words are treated as special symbols and not subject to normal scoping rules in many languages, including Ada which served as the model for PL/SQL.  Try defining "function begin" or "function end" in PL/SQL for two other examples of reserved words not allowed as identifiers.  That's why they're called "reserved" ...

(hope I don't come across as being too annoying, in my evil past I was a professional compiler writer)

"update" wasn't allowed even in 8i, and aD knew it (there's a comment in one of the packages saying that "'edit' is used as a name because Oracle won't let us use 'update'"

So it's a bit distressing to hear Jon mention that arguments against the use of "delete" and other reserved words that were incorrectly being allowed was ignored.

Oh well...we'll change 'em to "remove" or something.

First all, thanks for the responses.

Here are my answers:

C.R.: OS - Red Hat 7.2, Oracle MetaLink - Not sure, will find out.

Don:

I've changed all (hopefully) occurances of delete to del with the Kernel and Core scripts and it uploads ok.  The stack size is set to 500000.

Michel:

I think i've changed all the relavent Kernel parameters: sem is 250 32000 100 128, shmmax is 2147483648 and shmni is 4096.  Any others I should be looking for?

Simos:

I ran your script and it work:

Package created.

No errors.

Package body created.

No errors.
User-defined procedure DELETE allowed in Version 9.0.1.0.0
User-defined function NEW allowed in Version 9.0.1.0.0

PL/SQL procedure successfully completed.

Any other ideas?

Although it seems that Oracle9i allows user-defined DELETE and NEW procedures/functions, I agree with Don and Jon that reserved words should never be used as identifiers (or procedure/function names).

On a similar topic in the ArsDigita bboard, Hal Heisler suggested: "I changed references in the Makefile for the driver from -lcore8 to -lcore9 etc.. It built fine after that."

(See: http://www.arsdigita.com/bboard/q-and-a-fetch-msg?msg%5fid=000iiI&topic%5fid=21&topic=web%2fdb).

I hope this helps a bit.

I've made a break through.

Instead of looking at the OS/Oracle setup I decided to have a look at the SQL.  After a couple of hours I tracked the problem to the "violation" function in the "rel_constraints" package.

There are two SELECT statements accessing the views rel_contraints_violated_one and rel_constraints_violated_two, both of these cause an EOF on Comms Channel error.  This error is also generated by performing a simple SELECT on these views within SQLPLUS.

It looks to me as if there is something wrong with the views but I'll let you know what I find.

about shm parameters : here are the settings of my Solaris system
set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=100
set shmsys:shminfo_shmseg=10
set semsys:seminfo_semmni=100 set semsys:seminfo_semmsl=100
set semsys:seminfo_semmns=220 set semsys:seminfo_semopm=100
set semsys:seminfo_semvmx=32767
Please check the exact spelling, I did make a subtle typo once. hope that help, Oracle is not user friendly in case of 03113.
This appears to be a gen-u-ine Oracle bug.  That core dumps the dang thing!  Very cool.  My congratulations to the aD hacker who wrote that view.

Folks should take a look at Julian's note, as he points out that he thinks the queries are bogus to begin with.

Julian,

You may try to rewrite the two views and to explicitly specify the column names, i.e.:


create or replace view rel_constraints_violated_one as 
select constraint_id, constraint_name, rel_id, container_id, party_id,
  rel_type, rel_segment, rel_side, required_rel_segment
from ...

create or replace view rel_constraints_violated_two as 
select constraint_id, constraint_name, rel_id, container_id, party_id,
  rel_type, rel_segment, rel_side, required_rel_segment
from ...

I read somewhere that the "lazy" way of creating views (e.g. select constrained_rels.* instead of explicitly specifying the column names) results in Oracle crashing when a view's base table has been dropped and recreated with columns in different order or with extra columns. Bottom line: always, explicitly specify the columns when creating a view.
Regarding my last suggestion, I believe it should be:

create or replace view rel_constraints_violated_one as select constrained_rels.constraint_id, constrained_rels.constraint_name, constrained_rels.rel_id, constrained_rels.container_id, constrained_rels.party_id, constrained_rels.rel_type, constrained_rels.rel_segment, constrained_rels.rel_side, constrained_rels.required_rel_segment from ...

and similar for the rel_constraints_violated_two view.

Did you read Julian's note?  He can cure the core dump by removing an outer join, and these tables aren't being dropped and recreated either.
Well, Don, I guess my last post was more relevant as a suggestion on the big list of improvements that OpenACS4 can bring to ACS4!

This discussion thread has revealed that bad design decisions (using reserved words as identifiers) and sloppy coding (NULL=NULL) may work for a while but they are really disasters waiting to happen!

Keep up the good work!

Oh, I agree that listing columns to be returned from a view is good style.  I'm just trying to get your (or someone else) to look at Julian's post to see if he's right about being able to drop the outer join from the query altogether.

I'm busy busy today and tomorrow.

Don, I don't think the outer join can be dropped altogether, only the ambiguity of using the equality operator (=) on NULL columns; so, for the rel_constraints_violated_one view I think it should be:
where rspm.segment_id(+) = constrained_rels.required_rel_segment
and constrained_rels.container_id is null
and rspm.party_id is null;
and for the rel_constraints_violated_two view:
where rspm.segment_id(+) = constrained_rels.required_rel_segment
and constrained_rels.party_id is null
and rspm.party_id is null;