Forum OpenACS Q&A: Server recommendation for 20,000 users in dotLRN?

hi!

I would like some help in choosing a server for a dotLRN site that would have around 20,000 registered users. Oracle is the preferred database and it would run on a separate server. What would be a good server to recommend for use with Linux?

Any guidelines about how one would go about choosing a server for a large OpenACS/dotLRN site?

Thanks

/Mohan

Collapse
Posted by Bruno Mattarollo on

Hello Mohan,

When you say it would run on a separate server you mean that the DB server would run separated from the web servers?

When you ask server, are you asking brands or hardware characteristics as well?

Sorry to ask so many questions instead of answering but it would be good to have a bit more precision...

Collapse
Posted by Mohan Pakkurti on
Hi Bruno,

The group already has a 4 CPU HP server that is installed as an Oracle server. So, the idea is to use that as a db server and use a separate machine for the web server.

I am looking for recommendation in terms of hardware characteristics, # cpus, memory etc! The preferred operating system is Linux and I think the university is looking into buying a Dell server. So if there is a specific model that is recommended that is also great.

/Mohan

Collapse
Posted by Bruno Mattarollo on
Cool... I understand better now :)

What we decided to do here, at Greenpeace, is to use several uniprocessor machines for the webservers. Each machine has 512MB of RAM -even though now it seems cheap enough to have 1GB of RAM- and SCSI 10Krpm HDD's.These machines are HP and have been running as webservers for more than 260 days without need to reboot (redhat 7.3). Pentium III 800MHz. We are currently serving peaks of 16 hits/sec per machine during one hour and the load is quite noticeable, the CPU is really busy :)

Another way could be to go the SMP way, have a dual (or quad) processor machine to handle AOLServer, then you don't have the problems when trying to syncronize the cache among the different machines.

This work from Philip Greenspun, Eve Andersson and Andrew Grumet

http://philip.greenspun.com/internet-application-workbook/scaling

might give you some ideas on SMP vs Single Proc. We decided to go the single processor way because it's easier to add more simple machines that look the same when the load goes up.

We have had quite good experience with Dell so far, running RedHat Linux (5.x, 6.x, 7.x and now 8.0). I have been using a Dell 2500 for quite some time now with a hardware RAID card. Nice and fast but it's working as a database server.

Hope this helps ...
BTW all this wasn't for dotLRN but just openACS + custom code

Collapse
Posted by Robert Locke on
<blockquote> then you don't have the problems when trying to
syncronize the cache among the different machines
</blockquote>

Out of curiousity, will a default OACS installation work in a multi-server environment like the one you described for GP (ie, separate web, db servers)?  If not, what work is involved?

Collapse
Posted by Mohan Pakkurti on
Bruno!

Thanks for your comments and information. It is nice to hear the actual number of requests you are able to support per each CPU. Do you have some kind of special connection between the database server and the web server machines? Or do you just connect them to the ethernet hub and let them talk to each other like any other machine on your network.

Thanks

/Mohan

Collapse
Posted by Bruno Mattarollo on

Hello,

Robert, we had a default OpenACS 4.5 installation. Of course we are not using a global cache on the two webservers, but if you wanted to have one you would have to look at some of the work that was done by aD where you could specify a cluster in the configuration and when you flushed the cache on one webserver, it would request a special URL that would flush the cache on the other(s) machine(s). We haven't implemented that in these machines.

Mohan, the two webservers and the database server are on a 100Mbps switch, so to answer your question, there is nothing special between all the machines, of course the latency between the machines should be reduced to the strict minimum. You should look at 1Gbps ethernet these days 😊 even though the network is not really the bottleneck between the DB and the webservers.

One of the problems we are experiencing right now, and I think it will be better with AOLServer 4 on the horizon, is that the webservers are on 100% of CPU utilization at peak times, meaning that the CPU is our bottleneck on those machines, but there are problems in the configuration of the load balancer as well. What I mean is that if your DB is properly configured and you are using caching of database queries or pages then you should be careful to have enough RAM and enough CPU on the webservers... But adding a single processor webserver is much cheaper than looking into upgrading our Sun 250 to another model!.

Collapse
Posted by Brian Fenton on
Bruno,
I'm just curious. How did you come up with the 16 hits/sec figure? Was it just by examining your access logs?
Collapse
Posted by Bruno Mattarollo on
We are analyzing our logs with "sawmill" and I took the hour breakdown for a random weekday and took the top hour traffic, divided the amount of hits per 3600 (seconds in one hour) and that's how I came up with 16 hits per sec.

That's why I mention "during one hour" in my original message, it's not very clear, indeed but that's how I came up with these figures.

Just as an example, on Nov 21st, between 1pm and 2pm we had 71304 hits, that comes to 19.8 hits per second. Our load balancer never serves content from two servers at the same time, always from *one* :( That's one of the problems we are facing, it should "distribute" the load among the two machines. I think we could serve more hits per second if our load balancer was acting in a different way ;)

Collapse
Posted by Don Baccus on
Mohan - the major problem you're going to face, I think, is the slow performance of the permissions system, which dotLRN relies on heavily.  I have done some experimenting with materializing views which dramatically increases performance but that won't be available until 4.7 is released in spring.

Just a heads up for you ...

AOLserver 4.0 (Tcl 8.4 in particular) should indeed help out Greenpeace a lot.

Collapse
Posted by C. R. Oldham on
Don,

Please note that materialized views in Oracle are Enterprise-specific and are not available in Oracle Standard Edition.

Collapse
Posted by Dirk Gomez on
A warning: Don, we tried materialized views that refresh on commit on a project. It all worked fine and dandy in the lab, but it was a complete disaster when it came to the real system with concurrent accesses and considerably more data. Contention was so bad that Oracle came to a halt every 15 minutes or so.

Jonathan Lewis in "Practical Oracle 8i" on it:

"First, the ovehead that hits the system when you commit is large. In a busy OLTP system, several users could hit that moment together aund cause some fairly severe contention for resources. Second, if we execute a query against the base data when we have updated ut, but not yet committed our change, the query rewrite may not take place, and we may actually visit the base table rather than the materialied view." (p 530)

"Materialized views with fresh on commit have a signifanct overhead on commit." (Page 532)

Tom Kyte kinda writes the same.

Collapse
Posted by Don Baccus on
Actually I wasn't talking about using Oracle's materialized view feature but rather using triggers in PG and Oracle to maintain a materialized version of the person_member_map.  Sorry for the jargon confusion.

This will only change when people are added or deleted to groups (say, a class in dotLRN or the membership group for the site at large).

There's also a view specific to PG in the permissions system that we're materializing via triggers and that one only changes when new privileges are added, typically when a new package is installed.

I've done some reading on Oracle's materialized views and they seem all bound up with replication and distributed databases and less oriented towards something simple like this.  The more I read about them the more the words "complexity" and "high overhead" came to mind.

Since we know the relationships between various tables and the views in question we can write reasonably efficient triggers to maintain the tables.