Forum OpenACS Q&A: Re: A few general questions

6: Re: A few general questions (response to 5)

Posted by Gustaf Neumann on 06/08/11 09:56 AM

Concerning the use of the Tcl event-loop in aolserver:

When the plain Tcl event-loop is used in aolserver threads (e.g. connection threads) there are two problems: (1) under heavy load, we observed blocking (connection threads stop processing events) and (2) connection threads are expensive and a scarce resource: when you have e.g. five connection threads defined, and all five connection threads run into a request with the event loop (i.e. a "vwait"), the server cannot accepts further requests. Long running connection threads are in generally something to avoid in aolserver, if you want scalability

The tcl-event loop works perfectly within aolserver when the tcl-thread library is loaded (see http://www.openacs.org/xowiki/libthread).

The background delivery mechanism (http://www.openacs.org/xowiki/Boost_your_application_performance_to_serve_large_files!) is based on the Tcl thread library. We deliver often several hundred thousand files via background delivery per day on our production system, including pseudo streaming for mp4 files (requires local rewriting of video stream when someone jumps to an arbitrary position). The advantage of the tcl thread library is that with event based processing a single thread can deliver simultaneously several thousand streams with different transfer rates without blocking request threads. We use multiple tcl-threads for different purposes on our production system without any problems.

7: Re: A few general questions (response to 6)

Posted by ultra newb on 06/08/11 03:41 PM

So short answer... use the aolserver version if I care about scalability.

Thanks.

8: Re: A few general questions (response to 7)

Posted by Gustaf Neumann on 06/09/11 10:41 AM

No. the short answer is: if you care about scalability, don't block your connection threads, use background delivery and friends.

what do i mean by scalability: we have often more than 10.000 users concurrently logged in, more than 2.000 concurrently active. With this kind of scale, we see frequently 200 views per second (and about 5 times this number as hits).

Say, the server has 10 connection threads configured. If e.g. a query is delivering a large file, the time to finish for this query depends on the connection quality between the server and the client (which you can't influence). For a client with a good connection quality, time-to-finish might take e.g. 0.5 secs, for one with bad quality e.g. 10 secs, or a minute. So, without background delivery, the connection thread might be blocked for 10 secs, 1 min... Suppose, there are 10 clients, requesting the file over bad connection at about the same time. In this case, all 10 connection threads will be occupied for this time, the server won't be able to serve any requests. If we serve e.g. 100 query per sec, the 10 sec case will mean that 1000 queries have to be queued (for 1min: 6000). Increasing the number of connection threads by a factor of 2 or 5 does not change the picture, if really slow operations can occupy all connection thread.

With background delivery, the processing time in a connection thread is in the range of milliseconds, independently of the connection quality of the client. Therefore, one can keep the number of connection threads (and therefore the memory footprint) low and ensure scalability (for this kind of load).

The numbers above are in some respects conservative figures; when a site serves e.g. video content, the delivery times might be much larger.

What has this to do with your question: if you have a request that has to fetch content from a different site (via ns_sock or whatever), you are in a similar situation, if you don't know the connection quality or the size of the content that has to be transfered.

Recommendations: try to occupy connection threads as little as possible; if you have confidence to the performance of transfers from other sites content within a connection thread, use ns_socket and friends, and try to cache transfered content if possible; if you care for scalability, decouple spool time from processing time and use tcl threads and async io.

Hope this helps.

9: Re: A few general questions (response to 8)

Posted by ultra newb on 06/09/11 08:39 PM

Would help if I were at a higher level as far as understanding OpenACS and AOLServer, but I'm not :-)

Could you boil your "recommendations" to a simple "use X, do not use Y?" :-)

Question: If I use the "file-procs" method described above to get my custom procs loaded in and available, is this file resourced every time I use the proc, or is it just sourced one time?

Thanks.

10: Re: A few general questions (response to 9)

Posted by Gustaf Neumann on 06/10/11 09:13 AM

ok, i try once again, three rules:

(a) The easiest approach to use is to use ns_socket and friends.

(b) Don't use tcl socket operations in connection threads since this might result in lockups under high load.

The approach (a) requires the least knowledge on your side. Whether or not this is sufficiently scalable depends on your application and setup. If you care about scalability, try to guarantee short procession times in your your connection threads. Socket operations have the tendency to depend on other servers, therefore hard to give bounds, therefore scalability degrades.

If (a) is not sufficient scalable, use (c) async i/o based on tcl's non-blocking i/o in a separate thread based on the tcl thread library.

The approach (c) is used by the background delivery (as i tried to explain above), guaranteeing short processing time in the connection threads.

The *-procs.tcl files are sourced at server start time, not per usage (that would be very inefficient). Aolserver builds during startup a "blueprint" containing all library procs. This blueprint is used for initialization of every connection thread (the threads processing incoming requests). Therefore, one wants to update these procs in the threads in memory, one has to reload it explicitly via the admin interface (see above), or one has to restart the server.