Do you have any idea as to how frequently these downloads fail, in terms of percentage of requests?
There was a big mystery in ACS some years back where Oracle or the AOLserver Oracle driver would kick some sort of 0 byte write error occasionally.
The mystery was solved when someone finally proved that dropping the connection during the midst of the return of content would trigger the error. People still see these on Oracle sites, for this reason.
I'm sure you're running PG, of course. I make the above point to make clear that even with our modern day network, high speed connections, etc ... socket connections are dropped at times.
What's weird here is that the thread's still running. That may mean that there's no chance in hell that sockets are being dropped (oh, and if a very high percentage of downloads are failing that probably rules out socket drops too).
Or it may mean that there's some messed-up recovery code in AOLserver, our code, both, whatever.
Are you returning this content from the file system, or from PG? PG binary file return - more properly the driver hack I implemented so long ago to work around PG's poor binary file handling capability - is SLOW which is why we recommend mapping CR binary content to the file system. ns_return_file should take very little CPU time.