Forum OpenACS Q&A: Scaling, Redundancy, Multiple Machines...
I need some ideas for our plans to scale up an OpenACS 3x system. Currently we use one 700Mhz single processor to run: RH6/openacs/aolserver/postgresql and for mail, postfix. This is "Box 1".
There are times when our "uptime" load average goes over 2.4. This most often happens when we use openacs to send email to our ~15,000 users. And other times the load average goes up when serving many of our database backed webpage.
We are considering getting "Box 2" at the data center. This box would handle all of the mail and bulk mail send by aolserver from Box 1
Also, Box 2 could have the identical openacs/aolserver/postgres setup as Box 1, then in the unlikely event of box 1 failing, we could switch to box 2 by adding Box 1's IP to Box 2.
So, what are some ways to mirror the DB on Box 2. In my simplistic thinking, and because there are only a few places where inserts and updates are done, I could add a ns_log Notice "SQL1234: Insert ...." so it would show up in the log. The logs could roll at short intervals... and be automatically copied to Box 2. Box 2 could parse the log file and update mirror DB.... OR a better way???
We are looking for an inexpensive solution and not the megabux system that Jerry asked about in this thread:
Five 9s reliability, how would you do it?
Or would it be better to put the PG db on Box 2 and leave Box 1 with OpenACS/Aolserver? This senerio doesn't account for a possible failure.
Currently we are using and IDE drive and a box with 1/2 Gig Memory. I know that RAID would be possible but wouldn't some variation on my senario above be just as good and perhaps more cost effective?
As time goes on, our database gets bigger and gets relied on by more and more users, it is important that it does not become corrupted or cause long downtimes.
If you do decide to do some kind of poor man's replication with the logfile, it would be substantially easier to just turn on debug in your nsd config. pg driver will then log all statements. (Unfortunately for your purposes this means queries too but I imagine you could just grep for insert | update...) This would be far easier than tracking down all dml and adding a log statement with each.
point it to the postgres server on the 1st box. Make sure the 1st box backs
up the database to the 2nd box daily (You don't have to restore it, you just
need a backup of it on there.)
Now you have the 2nd box that can handle your mailing load and take over
some of the web load and, if box 1 dies you can restore the database backup
that is already on box 2, start postgres, point aolserver to it, and you're back
up and running with minimal downtime and the loss of less than a day's
worth of data.
Jonathan: ...a single 2cpu box would be both easier to administer and cheaper...
Does a 2-cpu box work just like a 1-cpu box, only faster or are their other benefits that linux and processes can use such as specifying cpu-1 for process x and cpu-2 for process y? And does this work with postgresql or aolserver or postfix?
David: Just install AOLServer and Postgres on the 2nd box, make sure it works but point it to the postgres server on the 1st box.
What do you mean, "point it to"? Does this mean that both databases get updated at the same time?
David: ...2nd box...take over some of the web load...
How does it take over the load? I assume that it is only a backup and not serving web pages.
David: ...and you're back up and running with minimal downtime and the loss of less than a day's worth of data.
or less if I do more frequent backups... Thanks...
Best to run "vmstat 5" to see whether the processes are CPU-bound, IO-bound, or whatnot. The vmstat man page has more info on what the numbers mean.
If the actual CPU usage is generally low, then the answer is simply to get a faster disk. You can mount /var/spool/mail on the faster disk and separate the disk operations for email from those for the web pages and database.
It can serve pages if you want it to if they are using the same database. Also you can make sure you connect to it to send out your mailings. The mailing work will be done by Box 2 but it will still use the database from Box 1.
And yes, more frequent backups means you'd lose less data in case of catastrophe.
no, 2 cpus is not the same as 1 faster cpu. yes, the OS will split processes among the CPUs where it can. postgres plays quite nicely this way (each backend is a separate process); don't know for sure about nsd since postgres is typically 90% of my cpu and nsd only a couple %.
david, if postgres is his bottleneck, having two servers hitting the same db won't help things which is why I suggested smp. but you're probably right that it's the mailing slowing things down.
And how does the load balancer work? Is it another box? Does each
aolserver box need its own IP address or maybe they get local
ip's via NAT by the load balancer....
Hey, I'm just imagining stuff here... Those that know can help me
understand. THANK YOU.
I would suggest having two to three network cards in all machines that are to be load balanced. The first network would be the public side that is reached via the load balancer and the second would be for communication between servers and to the DB server. The third nic would be for a failover situation with the load balancer (but if you want to do failover I suggest you just pay the cash and hire a network guru to figure that all out, that is where my knowledge ends and I personally pick up the phone and start signing checks.) I would just use some cheap cisco 2950 switches between the alteon and each group of servers and something along the same strength between the balanced servers and the DB machine(s).
When I did this setup last I was not using aolserver but apache and PHP, be thankful you are runing aolserver, so much simpler to configure and streamline. With apache I had to add lingerd and SQLRelay to run more then four webservers in front of a modestly sized Postgres DB machine and that was only getting me 60-80pps with a minimum of one DB query per page.
One thing I would also recommend is getting a standard machine for your web servers so you can cookie cutter their installation and know how long it will take your supplier to get a new machine to your colocation facility. The biggest nightmare I have in scaling up is adding machines due to delays from manufaturers. Remember redundancy means redundant headaches so try and get some more people to help, the workload to implement these things needs load balancing as well.
spread out the load using round robin dns. The load balancer will
automatically send requests to the machines that are up. Round robin dns
will blindly send requests to all your machines.