Forum OpenACS Q&A: ORACLE occasional stream_write_lob error when downloading files

Hi all!
I'm using the lastest release of the ACES that aD did. Sometimes I got this error, presumably it happens when someone is downloading a file, but now always happen.
Here is the error:
[07/Mar/2002:09:41:15][20044.1106956][-conn273-] Error: ora8.c:4472:stream_write_lob error writing to connection. incomplete write of 0 out of 16384

I found a http://www.arsdigita.com/bboard/q-and-a-fetch-msg?msg%5fid=000Zde&topic%5fid=21&topic=web%2fdb thread at aD bboard. I followed what it said but the error still happens.

I'm using:
Oracle 8.1.6
Aolserver 3.3+ad13
RedHat 7.1
On a Celeron Intel box.

Please, if someone can help me I will appreciate it, because right now I don't know what's going on.
=(
This error makes my aolserver entirely crash, so I need to restart it!
Ouch ... I've never seen AOLserver crash because of this, though.  What version of AOLserver are you using?

Unfortunately you can't get rid of these errors.  They are caused by the socket closing, which usually means the user got tired of waiting for the download to complete or otherwise lost the connection.

Aolserver 3.3+ad13

I can't believe that there isn't a workaround this problem / bug!

The dying of the write isn't a bug at all ... when the socket dies, it dies, and that's just the way it is.  AOLserver can't stop the end from hanging up the modem, or closing the browser, or otherwise causing the socket to close.

Now ... the dying is something else again.  How exactly is it crashing?  Do you get any other log messages at all?

Has anyone else seen this behavior?

Here is the log:

[07/Mar/2002:09:40:22][20044.1100807][-conn270-] Notice: bind variable 'user_id' = '1364'
[07/Mar/2002:09:40:22][20044.1100807][-conn270-] Notice: SQL():  select
 decode(ad_general_permissions.user_has_row_permission_p(1364, 'write', '33', 'FS_VERSIONS'), 't', 1, 0)
from dual
[07/Mar/2002:09:40:22][20044.1100807][-conn270-] Notice: SQL():
    select desired.file_id,
           file_title,
           folder_p,
           depth * 24 as n_pixels_in,
           round (fsvl.n_bytes / 1024) as n_kbytes,
           n_bytes,
           to_char (fsvl.creation_date, 'MM/DD/YY HH24:MI') as creation_date,
           nvl (fsvl.file_type, upper (fsvl.file_extension) || ' File') as file_type,
           sort_key,
           fsvl.version_id,
           fsvl.client_file_name,
           fsvl.url
    from   fs_versions_latest fsvl,
           (
select file_id,
       file_title,
       sort_key,
       depth,
       folder_p,
       owner_id,
       group_id,
       public_p,
       parent_id,
       level as the_level
from   fs_files
where  deleted_p = 'f'
connect by parent_id = prior file_id start with file_id = :file_id
) desired
    where  fsvl.file_id = desired.file_id
    and    (ad_general_permissions.user_has_row_permission_p (:local_user_id, 'read', fsvl.version_id, 'FS_VERSIONS') = 't' o
r owner_id = :local_user_id or folder_p = 't')
    order by sort_key
[07/Mar/2002:09:40:22][20044.1100807][-conn270-] Notice: bind variable 'file_id' = '33'
[07/Mar/2002:09:40:22][20044.1100807][-conn270-] Notice: bind variable 'local_user_id' = '1364'
[07/Mar/2002:09:40:22][20044.1100807][-conn270-] Notice: bind variable 'local_user_id' = '1364'
[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: SQL():  select file_type, n_bytes from fs_versions where version_id 
= :version_id
[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: bind variable 'version_id' = '43'
[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: SQL():
        select group_id from fs_files fsf, fs_versions fsv
        where version_id = :version_id and fsf.file_id = fsv.file_id
[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: bind variable 'version_id' = '43'
[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: SQL():
        select decode(count(*),0,0,1) from user_group_map where user_id = :user_id and group_id = :group_id

[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: bind variable 'user_id' = '1364'
[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: bind variable 'group_id' = '33'
[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: SQL():  select count(fs_files.file_id)
                                          from fs_files, fs_versions_latest
                                          where fs_files.file_id = fs_versions_latest.file_id
                                          and fs_files.owner_id = :user_id and version_id = :version_id
[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: bind variable 'user_id' = '1364'
[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: bind variable 'version_id' = '43'
[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: SQL():  select
 decode(ad_general_permissions.user_has_row_permission_p(1364, 'read', '43', 'FS_VERSIONS'), 't', 1, 0)
from dual
[07/Mar/2002:09:40:48][20044.1106956][-conn273-] Notice: SQL():  select version_content
                                 from   fs_versions
                                 where  version_id = 43
[07/Mar/2002:09:41:15][20044.1106956][-conn273-] Error: ora8.c:4472:stream_write_lob error writing to connection.  incomplete
 write of 0 out of 16384
[07/Mar/2002:09:43:09][20044.1108999][-sched:4-] Notice: Running scheduled proc gc_send_bboard_alerts...
[07/Mar/2002:09:43:09][20044.1108999][-sched:4-] Notice: Sending alerts for gc-bboard on http://eeduk.unis.edu.gt
[07/Mar/2002:09:45:02][20044.1110026][-sched:11-] Notice: Running scheduled proc process_email_queue...
[07/Mar/2002:10:02:13][20044.1115152][-sched:19-] Notice: Running scheduled proc send_scheduled_spam_messages...
[07/Mar/2002:10:02:13][20044.1115152][-sched:19-] Notice: running scheduled spam sending daemon
[07/Mar/2002:10:05:18][20044.2051][-sched-] Notice: Running scheduled proc sec_sweep_sessions...
[07/Mar/2002:10:15:17][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...
[07/Mar/2002:10:15:18][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...
[07/Mar/2002:10:15:19][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...
[07/Mar/2002:10:15:20][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...
[07/Mar/2002:10:15:21][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...
[07/Mar/2002:10:15:22][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...
[07/Mar/2002:10:15:23][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...
[07/Mar/2002:10:15:24][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...
[07/Mar/2002:10:15:25][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...
[07/Mar/2002:10:15:26][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...
[07/Mar/2002:10:15:27][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...
[07/Mar/2002:10:15:28][4533.1024][-main-] Warning: unix: attempting again to kill process 20044 after waiting 10 seconds
[07/Mar/2002:10:15:28][4533.1024][-main-] Notice: unix: waiting for killed process 20044 to die...

I see that the error ocurrs when somebody is trying to upload a file, any suggestions?
I am guessing here as I have never seen this. How much disk/swap space do you have? Temp partition sizes. Also, how are you running your services? inittab/rc or daemontools.
How much disk/swap space do you have? Temp partition sizes. 

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1         2     16033+  83  Linux
/dev/hda2             3      2481  19912567+   5  Extended
/dev/hda5             3       133   1052226   82  Linux swap
/dev/hda6           134      2481  18860278+  83  Linux


Filesystem            Size  Used Avail Use% Mounted on
/dev/hda6              18G   11G  6.1G  64% /
/dev/hda1              15M  3.5M   10M  24% /boot

There is about 1Gb of Swap and I have 512MB RAM
typing free in the shell I got this

             total       used       free     shared    buffers     cached
Mem:        512344     507260       5084          0       3652     241376
-/+ buffers/cache:     262232     250112
Swap:      1052216     210516     841700



how are you running your services? 
with inittab/rc 
I just followed the stardard ACS installation instructions, so that's the way it is as far as I remember.

Jon, this tell you something?

It's a tiny file ... that shouldn't be the problem.  And I assume you have stacksize set correctly in the nsd.tcl file, as per the install instructions?

We've seen the incomplete write error plenty of times on one site but it's never crashed it.  But that was 8.1.6 on RH 6.2 ...

Don, this is the actual stacksize

[ns/parameters]
User=nsadmin
ServerLog=/home/aol33/log/unis-3-error.log
Home=/home/aol33
StackSize=500000
auxconfigdir=/web/unis-3/parameters


It is correct? or should I increase it?
You problem may be solved by upgrading the Oracle DB to a 8.1.7 release (preferably 8.1.7.3). Checking Metalink, I see that in 8.1.7.1 they've fixed Bug 1418809, tersely described as:
OCI insert of a LOB can crash the client
Indeed, if I remember correctly, the AOLServer driver for Oracle is OCI-based. There's no guarantee that your problem will be fixed by upgrading Oracle alone, but I think it could be worthwhile.
Thanks Sebastiano,
I have a couple of questions to you.
1. Is there an available patch to solve this problem?, so I can avoid the upgrade
2. How can I upgrade to 8.1.7.3? any docs that may help? what do I with the old installation? consider that this is a production system, so I need to be very careful with the data and I can't be offline a long time.

bye, =)
The Oracle driver is indeed OCI based ...
1. Is there an available patch to solve this problem?, so I can avoid the upgrade 2. How can I upgrade to 8.1.7.3? any docs that may help? what do I with the old installation? consider that this is a production system, so I need to be very careful with the data and I can't be offline a long time.
  1. The upgrade is the patch, AFAIK. Oracle usually doesn't backport fixes, unless you're a very big customer with a peculiar support contract. If I was in your position, I'd try getting a 8.1.7 upgrade via your Oracle sales contact, and then update it to 8.1.7.3 via the patchsets available via anonymous FTP from oracle-ftp.oracle.com/server/patchsets
  2. I share your concerns about the availability of your production machine. I'd try cloning the DB to another machine and perform the upgrade/migration on that; there is plenty of documentation on the web, either from Oracle (be sure to read the release notes!) or third parties. Check for example Howard J. Rogers' site and the Oracle cooperative FAQ. Search also Google groups, you'll find lots of questions and answers about Oracle upgrades and migrations.
Well, actually we have followed the problem close, we found that it happens mostly when you are uploading a file to the system, even if you are working with a machine that has aolserver, aces, and oracle in the same box.

Now my question will go in this way, if I want not to upgrade oracle, I would like to install it again, but oracle 8.1.7 this time, what should I do? and if I dump the DB that is in 8.1.6, will it work in 8.1.7?

What do you recomend as the better way to solve my problem?

Well, actually we have followed the problem close, we found that it happens mostly when you are uploading a file to the system, even if you are working with a machine that has aolserver, aces, and oracle in the same box.
It shouldn't matter if AOLServer and Oracle are on the same box or are separated by 3 continents, as long as they communicate via OCI you should experience the problem.
Now my question will go in this way, if I want not to upgrade oracle, I would like to install it again, but oracle 8.1.7 this time, what should I do? and if I dump the DB that is in 8.1.6, will it work in 8.1.7?
If installing on the same machine, install 8.1.7 under another Oracle home (or under another account if you like), then use exp from the 8.1.6 instance and imp on the 8.1.7 instance. This SHOULD work, but absolutely test everything on a staging machine before doing anything on your production configuration.
Thanks Sebastiano,
which directory name do you recomend me to use to install the new oracle in the same box?

I guess I shouldn't run both databases at a time, so how to make svrgmrl to know which DB I'm refering to when I try to stop / start it?

You can run more than one Oracle instance at a time on the same box - providing you have enough disk space and memory. Before invoking svrmgr, set the ORACLE_SID enviroment variable to the one you want to log into using whatever syntax is appropriate to your OS/shell.
Thanks Cynthia, and about the new Oracle home what should I choose ora87 is ok? becuase right now I have ora8
ora87 is fine.  Personally, I have an /ora8 and an /ora817.