Forum OpenACS Q&A: Response to OpenACS scalability, and testing.

Collapse
Posted by Jeff Barrett on
I do not run OpenACS but I run a high volume web site that is backed by Postgres 7.1. We are ramping up our systems as I type this for a major roll out next year, 100-120 hits per second expected for 2-3 day duration bursts (I think a slashdot usually only lasts for 1/2 a day?).

On just the DB side (the web application is Apache and PHP for dynamic content) we have ripped out all queires but one and replaced then with cached files. The one querie/update tracks user sessions and that has been replaced with PL/SQL since the over head in our environment is in bringing data sets back and forth between the database servers and the 6 load balanced web servers (not to mention that PHP and Apache sucks at closing and maintaining database connections even with persistant connections, look at lingerd and TCP stack tweaking to fix that up a bit for Apache). So every page calls a minimum of 1 sql call with others less frequent pages doing up to 10 a page and once again as much work and logic as possible is done on the db side to minimize the size of data being passed between systems.

In our current testing we ran a pretty good use test program that simulated 500 concurrent users for three days with each user requesting a page on randomly every 8-15 seconds based on serveral use patterns and Postgres held up find (nightly vacuuming was needed). I think we grew one of the main tables from .5 million rows to 20 million during that test. Take images off the main web server and place them on a stripped down machine with a very efficent version of Apache installed (they have some other bare bones web servers for this same thing, but I have to use 'popular' software for webservers to keep some clients happy) and get a nice load balancer like an Alteon AD3 to do load balancing and redirect the image requests to that machine.

It has taken about a month for two people to set the servers up and another month is expected for code tweaking. Look at Above.net to colocate, they have some nice rates and they allow for like 36 hours of free not fined bursting to whatever level you like. 36 hours would be enough time to fit a /.ing in without having to pay for some huge bandwidth alotment.

I have had the DB crash a couple times usually when restructuring a table, I inherited a system with no foriegn keys, so I had massive clean up to do. So that is not what I call a production problem. When it did crap out I just dumped the table and loaded a backup back in rebuilt some indexes, functions and constraints and I would be back up in an hour or so. I don't think maintenance is that much of a time consuming action, automate your backups and vacums and you should be fine.

The time aspects does not seem to be that much of a problem for us and since the dot com bubble burst hardware can be gotten dirt cheep (I think we got that Alteon for 3,000, what a deal!)

Good luck with the /.ing. I would like to know how it works out for you.

-- Jeff