Forum OpenACS Q&A: Response to Five 9s reliability, how would you do it?

Posted by David Walker on
I think DOS attacks are among the most difficult to deal with since they cane
be accomplished with spoofed packets.  The easiest solution is to not be
where the attack is hitting, something you might accomplish with your
geographical distribution plan.

The more bandwidth your ISP has the more options you have and the more
work that must be done to actually deny your service.

Many attacks become much easier to weather if you can deny them at a
firewall.  If you discover you are under attack from ICMP packets or packets on
a non mission critical port you can deny them and that will reduce the traffic
on your network (attack packets will still come in but no replies to them will
go out).

I'd suggest you don't offer 5 nines from the launch of the site but actually
starting at a later date so you can get the kinks worked out if any come up

Definitely look at every router, firewall, or piece of equipment between the
internet and your servers and make sure it is redundant.

I'm curious how successful Oracle is at running redundant database servers.
On Postgres my strategy is to use redundant hot swappable disks and have
a backup computer available in case of trouble.  Try to accomplish 5 nines
with that strategy and you need a reasonally intelligent, educated person
sitting next to the servers at all times.  (Which you might need anyway)

Even if your ISP has a super-duper redundant power system (and they should
if you want this level of reliability) add your own UPS as well to cover any
incidents that may arise.  I've heard one suggestion of having redundant
power supplies, each connected to a different source of power that sounds
good to me.  If one power supply is connected to your UPS and one to the
ISP's power than you can handle a power problem from your ISP or you can
replace your UPS.

How does an dynamic site handle having 2 geographical locations?  It seems
very easy to end up with unsynced sites no matter what sync method you
would choose.

Make sure the customer knows that, just because they can't see the site
doesn't necessarily mean that it is down. (But probably does if you have
redundant geographical locations)