Forum OpenACS Q&A: Server melting down

Posted by Jarkko Laine on 09/13/03 04:01 PM

We hosted an orienteering meeting today and for some reason nsd seemed to melt down very easily, giving these errors:
[13/Sep/2003:14:48:13][28967.3076][-thread3076-] Warning:
serv: no free connections, dropping this one, total so far: 0

Is this a configuration error or what? I can't think of a few thousand visitors killing OACS this easily. Hardware or db shouldn't be the bottleneck since there's plenty other sites on the same server that worked fine all the time.

One thing I noted was that the size of nsd got quite high in a few minutes, somewhere up to 170MB. After restarting it was down to some 15MB.

2: Re: Server melting down (response to 1)

Posted by Dave Bauer on 09/13/03 05:34 PM

What is maxthreads set to on your server? Seems like AOLserver was handling a large number of simultaneous threads. 170mb isn't that huge for a busy AOLserver process.

3: Re: Server melting down (response to 1)

Posted by Jarkko Laine on 09/13/03 06:36 PM

It wasn't set at all. Don't know what the default is, though. D'you mean that the current value is too big? What would be a reasonable value for a busy site?

4: Re: Server melting down (response to 3)

Posted by Andrew Piskorski on 09/14/03 09:53 AM

With AOLserver 3.3+ad13 I believe the defaults are minthreads 0, maxthreads 20, threadtimeout 120. No, you probably wanted to increase your values for all of those.

Good values are reportedly very hardware and operating system specific, but since most of us are using Linux with a 2.4.x kernel, it would be nice to see AOLserver tuning reports there. I haven't seen any though.

Since you are sharing the server with other OpenACS instances, maybe setting minthreads 5, maxthreads 30, threadtimeout 600 would be good for you. If it was the only site on the box, perhaps minthreads 35, maxthreads 35 would be more appropriate. But I don't really know for sure, I just made up those numbers.

5: Re: Server melting down (response to 4)

Posted by Jarkko Laine on 09/16/03 11:25 AM

Well, I made up numbers, too, and finally set both min and maxthreads to 50 and pushed up the connections of all db pools to 5 (it was previously 2 for subquery).

After that the site started working perfectly well, serving pages with up to 100 simultaneous connections (tried it with ab).

Maybe I should put them a bit lower, though, I'll keep on testing.

6: Re: Server melting down (response to 1)

Posted by C. R. Oldham on 09/16/03 11:52 PM

Heh,

We just put our new annual report online for our members to fill out. On a dual 800 MHz Xeon (AOLserver 3.5.6) with 2 GB of RAM, Linux kernel 2.4.22 we have

ns_param MaxThreads 40        ;# Maximum threads
ns_param MinThreads 40        ;# Start this many on server start
ns_param ThreadTimeout 60     ;# Shut down a thread if idle after 60 seconds
ns_param MaxConnections 100   ;# Queue and service this many conns.
	                      ;# If over this many conns, send 503 Server Busy
ns_param MaxDropped 0         ;# Disable auto shutdown of server in case of
                              ;# too many Server Too Busy messages.

And each of the main/log/subquery database pools have 55 connections configured. I did it that way because I didn't want a thread to wait on a connection if by some chance all the threads hit the DB at the same time.

Our nsd RSS sizes are hovering around 719 MB and have been as high as 1.1 GB. That doesn't seem normal to me, but things are running OK, so I'm not (yet) worried.

We are seeing OK performance. I think it should be better but I'm having a hard time telling if the database or the webserver is the bottleneck. Suggestions for determining that will be appreciated.

7: Re: Server melting down (response to 6)

Posted by Patrick Giagnocavo on 09/17/03 12:10 AM

C.R. , that seems tremendously high.

I would not be surprised to find performance increases if you dropped the threads to 20 and the db pools to say 15 apiece.

You could try using the nstelemetry.adp and loading the page a few times when the server is busy to see how many threads are actually busy.

My guess: unless you are serving more than 5000 visitors per day the lower counts I recommend will be more than adequate. A better number for RAM usage would be under 200; one client serves 50GBytes + per month of db-intensive pages and stays at about 100-115MB for the nsd8x process size.

You should probably bump up the amount of shared memory that Postgres uses and the buffers settings, etc. should also be increased.

8: Re: Server melting down (response to 1)

Posted by C. R. Oldham on 09/17/03 07:42 PM

Thanks for the advice Patrick.

I'll do some more tuning. BTW we are on Oracle, not PG. Why should DB connections in each pool be less than the number of threads?

And does anyone know a good way to tell if our bottlenecks are the webserver or the DB? Both are on different boxes. The Oracle server is 8.1.7.4, 4 GB, dual 2.6 GHz Xeon, hyperthreading turned off. One RAID 10 array.

I have nstelemetry installed, but am having trouble interpreting it.

--cro

9: Re: Server melting down (response to 8)

Posted by Andrew Piskorski on 09/18/03 06:26 AM

C.R., ThreadTimeout of 60 is very low, but for you irrelevent because you have max and min threads set the same.

Presumably a connection thread ends up spending a large fraction of its time feeding content back to a slow modem user over TCP/IP, and the thread is not holding a database handle during that time (well, as long as you haven't botched your Tcl programming it's not). So it seems reasonable that most people use a number of threads substantially larger than their number of db handles.

I suspect an Oracle connection is also pretty heavyweight, so you are probably sucking up lots of RAM for no good reason in database connections which you never use. Hopefully it will all just be swapped out most of the time, of course.

But all that is just my supposition. Some kind of nice Linux, AOLserver, Oracle, PostgreSQL, and OpenACS performance test suite, and good repeatable results from it, would be nice to have. :)