Forum OpenACS Q&A: Server Load Question...

Collapse
Posted by James Thornton on
My web server is on a dedicated ADSL connection
(128K upload), and I am getting ~13 KB sustained throughput when I
download a large file from an external connection.

I am running AOLserver 3.2 and PostgreSQL 7.03 on a Linux box with a
single PIII 500 MHz processor and 320 MB memory.

For a university group project, we are creating an online "CBT" course
that our classmates (~50 students) will take from the school's
computer lab during class next week (school has a T3).

I built the course with the WimpyPoint module, and I added a feature
that allows you to add a CBT style question to any slide. The course
is approx. 55 slides (~5KB each), and it should take about an hour to
complete.

Yesterday, I was trying to determine if my DSL line would handle it.
Today, my group tested it -- all seven of us went through the through
the course at the same time but at an accelerated rate to simulate
more users.

I checked load average as we were hitting it, and it was over 4. Does
this seem right? How many simultaneous active users should this
server support?

Also, I have seen numbers for how many simultaneous active users a T1
will support, but how many should a 128K aDSL line support? How is
this determined taking into account the "bursty" nature of the Web?

Thanks.

Collapse
Posted by Don Baccus on
If you're mostly serving text you can serve a lot over a 128K aDSL line.  Each page will be a few KB.  Once you start serving lots of images and graphics you'll start to get clogged up bandwidth.

Regarding your system load, do your wimpy point presentations include graphics and images?  If so, streaming them to your users will be somewhat CPU intensive.  I would be surprised to see it so high, though, with seven users on a relatively low-bandwidth line.

You've got plenty of memory, so ...

Are you starting the postmaster with a high -B value?  (if it is low, you're probably doing a lot of transfering from OS filebufs to the buffer cache, which is cycle-intensive).  The PG default of 64 is far too low for any production work.

Have you vacuum analyzed your database after adding significant chunks  of content?  This is a crucial point in achieving PG performance.

Lastly, knowing that the load number is high isn't of all that much use without knowing which processes are chewing up all those cycles...

Collapse
Posted by Stephen van Egmond on
Load average really isn't that useful a number.  You want the cpu idle % to know how idle your machine is.  The sysstat command will tell you:# sysstat 1time              CPU    %user    %nice  %system    %idle13:10:09          all      0.00      0.00      1.00    99.00If you look carefully at the definition of load average, it doesn't accurately reflect how busy your CPU is.  I system could have a load average of 20 and still not have its CPUs saturated.
Collapse
Posted by mark dalrymple on
and to say again something already said above.  Load average isn't interesting in and of itself, but in relation to what it has been before.  If your load is usually 3.7 then a load average of 4 isn't all that interesting.  If your load average is usually 0.01 on the other hand, you may have something wacko.
Collapse
Posted by James Thornton on
If you do a search on Google for "uptime unix load average," then you will find http://www.useforesite.com/uptime.shtml.

Here's an excerpt (is it inaccurate?)...

Load Average

On a single processor machine, a load of 1 is maximum efficient utilization. Loads more than the number of processors mean the machine is too heavily loaded. Any load numbers in the 2 or 3 range is an indication of excessive CPU use and consequently poor performance. Load average numbers should be in the decimal range, for example; .02 or .53. 

Load average is the amount of load that the server's CPU is experiencing. What creates load on a CPU? When a program is run i.e., a search program, a shopping cart program, a request to upload a web site's page to a browser, an email program etc.. When any of the preceding scenarios occur, a load (or demand) is placed on the server's CPU. Some processes are given a higher priority by the CPU i.e., if a server is performing a search and a visiting web surfer happens to request a web page from a site hosted on that same server, then the page upload is given priority over the search. The search will slow down in order to accommodate the page upload.

Relative to page uploads the CPU's load average is not as critical as the pipeline to server. The pipeline is the connection from the server to the backbone provider. Pipelines are designated as 0C3, DS3, T3, T1, etc.. and are an indication of how much data can be transmitted in kilo bytes per second. A heavily loaded CPU will usually be able to out perform the pipeline.

The load average numbers of 0.28, 0.18, 0.22 are reflections of 1, 5 and 15 minute intervals respectively.

Numbers like this "3.30, 1.05, 0.96" are not as much a cause for alarm as numbers like this "2.52, 2.56, 2.51". The second set of numbers show consistent heavy demand on the processor. This consistent heavy load will deny the web pages the priority they need to load quickly. The first set of numbers is indicative of a single process or program (such as a search) performing it's function and will likely end very soon.

Collapse
Posted by James Thornton on
Don -

Thanks for your reply.

Once you start serving lots of images and graphics you'll start to get clogged up bandwidth.

I am not putting any images over my DSL line -- all images are stored at my university Web space and are being pulled from there.

The PG default of 64 is far too low for any production work.

What buffer size would you recommend?

Have you vacuum analyzed your database after adding significant chunks of content?

I ran it last night, and it was still running this morning -- how long should it take to run the first time?

Lastly, knowing that the load number is high isn't of all that much use without knowing which processes are chewing up all those cycles...

It was postmaster...

Collapse
Posted by mark dalrymple on
I don't know what that guy is smoking, but I wish he would share.  All the load average is the depth of the kernel's "runnable" queue - the processes which are able to be run.  When a process is waiting for CPU resources, it's put on the runnable queue.  When a process unblocks from I/O, it gets put on the runnable queue.  When a process wakes up from a sempahore block, it gets put on the runnable queue.  Maybe your load is just due to a lot of i/o and not CPU at all?  The idea that a load average of one implies perfect CPU utilization is ludicrous.  I personally have seen monoprocessor machines running with loads of 2 and above and still have idle CPU available.

Also, his talk of priorities is off base.  Unless the sysadmin / machine configurator takes special steps to make sure that the webserver program (or this "search" he's talking about) runs at higher priority.  By default, all unix processess run at the same priority (or 'nice level'), so in this example the web pages being served will have the same opportunities for CPU time as the "search".

Collapse
Posted by Don Baccus on
Vacuum running all night?  I've never seen it take more than a few minutes on my database, which isn't huge but isn't tiny either.

With 320MB of memory you can make the shared buffer pool about as big as you want, you've got a ton of memory!  If you've compiled PG 7.0 with our recommended 16KB blocksize then -B 2000 will use 32MB, or 10%  of the RAM on your box.  On my own 256MB server I run -B 6000, i.e. 96MB and let AOLserver and the operating system file buffer and code etc use the rest.

Collapse
Posted by James Thornton on
vacuum analyze keeps hanging here...
NOTICE:  --Relation referer_log--
NOTICE:  Pages 519: Changed 0, reaped 499, Empty 0, New 0; Tup 23777: Vac 42322, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 72, MaxLen 324; Re-using: Free/Avail. Space 5118444/5111928; EndEmpty/Avail. Pages 0/498. CPU 0.05s/0.22u sec.
NOTICE:  Index referer_log_date_idx: Pages 158; Tuples 23777: Deleted 0. CPU 0.02s/0.06u sec.
Collapse
Posted by James Thornton on
I decided to back up the database, drop referer_log, recreate it and its index, and try vacuum analyze again with an empty referer_log.

It worked -- vacuum analyze completed in less than two min.

Collapse
Posted by James Thornton on
For the archive...

I was getting this error when trying to start postgres with a block size > the default of 64...

[postgres@roam pgsql]$ /usr/local/pgsql/bin/postmaster -B 2000 -D /usr/local/pgsql/data

IpcMemoryCreate: shmget failed (Invalid argument)
key=5432001, size=33652736, permission=600
This type of error is usually caused by an improper
shared memory or System V IPC semaphore configuration.
For more information, see the FAQ and platform-specific
FAQ's in the source directory pgsql/doc or on our
web site at http://www.postgresql.org.
FATAL 1: ShmemCreate: cannot create region

I found the solution at http://www.ca.postgresql.org/devel-corner/docs/postgres/kernel-resources.html#SYSVIPC-PARAMETERS.

In a nutshell...

[regarding GNU/Linux] The default shared memory limit (both SHMMAX and SHMALL) is 32 MB in 2.2 kernels, but it can be changed in the proc file system (without reboot). For example, to allow 128 MB:

$ echo 134217728 >/proc/sys/kernel/shmall
$ echo 134217728 >/proc/sys/kernel/shmmax

NOTE: 134217728 = 128 * 1024 * 1024

You could put these commands into a script run at boot-time.

Alternatively, you can use sysctl, if available, to control these parameters. Look for a file called /etc/sysctl.conf and add lines like the following to it:

kernel.shmall = 134217728
kernel.shmmax = 134217728

This file is usually processed at boot time, but sysctl can also be called explicitly later.

Other parameters are sufficiently sized for any application. If you want to see for yourself look into /usr/src/linux/include/asm-xxx/shmparam.h and /usr/src/linux/include/linux/sem.h.