Forum OpenACS Q&A: Re: How to correctly configure the reverse proxy.

Posted by Gustaf Neumann on
As in my last replies, i recommend to upgrade to NaviServer 4.99.20 (.18 is two years old, many things have changed around this area). However, i can answer to your specific questions

What timeout configuration is this warning referring to?

I am not sure, you want to hear this on this detail, but a short answer would raise more questions.... When turning on debugging, the system exposes more details that normally an application developer does not have to care about. ... but it provides as well debugging hints. When a socket say "would block, no timeout configured" this means, that on a read/write operation the OS-kernel reported that it refuses right now to read/write as the user-space program tried (here NaviServer) in non-blocking mode. For this concrete read/write operation, no timeout was provided (which is intended behavior for non-blocking socket operations in the socks thread. The same function is uses as well in other contexts). If the socket would be defined as blocking, it would block right now. Avoiding blocking operation is essential for the socks thread, since otherwise, all other concurrent operations in the same thread would halt as well.

The situation with "partial writes" is very similar: the user-space program want to write N bytes, but actually the kernel refuses to do this and writes only some X bytes (where X < N). The kernel can do so whenever it wants, this is allowed behavior from POSIX (see e.g. [1]). Different OSes do it so far rarely (macOS), other ones do this often (Linux, especially in newer versions).

To cope with these conditions, the user-level program has to retry the read/write (on our sockets receive/send) operations with the remaining data whenever the socket becomes readable or writable again. Handling of these operations in Tcl is unfortunately somewhat tricky, and is is also made more complex for TLS connections, where sometimes incomplete write operations mean that one has to wait for readability of the socket, since openssl does under the hood still some negotiations). On the Tcl-side we have two problem areas: where should the remaining data stored, how long, etc., what to do if in the meantime more data arrives, ... and what to do about binary/text data, especially when the byte-stream is separated inside a multi-byte character and the like. Earlier versions of revproxy solved this via nsv, switching different callbacks on the fly etc., while newer versions use newer socket functions in NaviServer that provide internal storage of the left-over bytes, such that the Tcl-side does not have to care about this.

So, under the hood, the code in 4.99.20 is very different compared to earlier versions, exactly to reduce the complexity sketched above. A similar situation is also around WebSockets. The good old WebSocket implementation of NaviServer worked perfectly fine over years for e.g. chat and similar other low-traffic, low volume operations, but when we started to use this for shared rooms with VR glasses (oculus) and WebVR/WebXR where one every movement substantial amount of data is sent, the old implementation run into many problems with partial reads/writes etc. This uses now the exactly same C-level operations.

So, sorry for being lengthy. Hope it explains reduced enthusiasm to dig into old code, since sooner or later you will want to upgrade.