Forum OpenACS Q&A: Re: Business grade OACS hosting/managed services?

I was reluctant to respond to this since I don't think blatant self-promotion belongs on these forums, but this is a subject that comes up from time to time and it's something we know a fair bit about.

Basically, furfly's core business is what you're asking about. We've been doing this for 5 years now and host high-bandwidth ACS sites for folks like The New York Review of Books, MIT's Archnet Project, and Edward Tufte.

We're expensive; our hosting prices start at $500 a month and up depending on needs. But you can't provide enterprise-class service on the cheap.

Ok, enough self-promotion. Here's some things that in our experience is necessary for high-performance and high-availability hosting:

First, you gotta have a good network. It doesn't make any difference what kind of server hardware you have if your network is junk. You have to go up the food-chain as far as you can afford. Leasing space from your brother-in-law who is using space from an ISP that's using space from a bandwidth broker who's leasing from a real tier 1 host just ain't gonna work. If any of those folks in that chain can't make their monthly payment you're screwed.

We've been with Exodus (actually owned by Savvis now) since the beginning and deal directly with them. And while there's no guarantee that Exodus won't run out of money and lock the doors on any given day, there's as least some comfort knowing that if that happens the sites of folks like Yahoo!, Google, Slashdot, and Microsoft will go down, too. [Actually, the first time Exodus went into bankruptcy we were sent a memo that President Bush has signed off on Exodus being a "Important Infrastructure Utility" or something and that the US government would guarantee the continued operation of the datacenters.]

If you deal with a tier 1 host like Exodus, Level 3, or XO a lot of little problems go away, too. Power will always work no matter what (the Exodus datacenter a few blocks from Ground Zero continued to operate during 9/11), you'll have strong physical security, air conditioning and fire suppression. The actually network is likely to be good with multiple redundant connections.

In the 5 years we've been with Exodus (and several years of experience with them before we started furfly) we've never had a systemic power or network failure. None.

Now, after your network and physical space is taken care of you need to look at hardware.

First, you need a good network switch if you're not being provided one. And a spare. And these need to be enterprise-class since they'll be running and loaded 24/7. Cisco is good but we've been happy with the HP ProCurve series. Don't go cheap here and get a hub from CompUSA -- you'll regret it. If you have lots of money and need more stress in your life you can get fancy and expensive highly-redundant units that monitor each other. Otherwise keep a spare onsite that you can use if the primary fails. This is important as the switch is a single point of failure -- if it dies your network is off-line.

What servers you need really depends on your needs. We like the Dell rack mounts with redundant power supplies, internal SCSI disks on hardware RAID and running Linux. Sun is good, too. I'm running one Apple OS X server right now as a test -- it shows promise.

OS doesn't matter much, either, as long as it runs the software you need and you know how to manage it. Most of ours are Red Hat Linux, the newer systems using their Enterprise offerings. FreeBSD, OpenBSD, OSX, and Solaris will all do the job.

I like to keep things simple. You can do things like load-balancing, system-fallover prevention, and virtual servers but all these things add complexity -- you need to ask yourself how important these things are to the task at hand and if the added complexity is worth it. Photo.net recently upgraded for their quad-CPU Sun box to a new Dell machine with externally mounted RAID arrays. You can go over there and read how the added complexity is working for them.

Keep spare parts around, too. You never know when something minor like a fan will fail and cause a system to overheat and crash.

Monitoring is next. There is a whole range of products out there from the disk space checking scripts in Red Hat to more in-depth packages like NetSaint (now Nagios) or Big Brother. You should pick one, use it and have it send problems to a pager or cell-phone you give to your sysadmin.

Speaking of your sysadmin, the single most important thing is having one that knows what to do. The subject of what makes a good sysadmin is a subject for another time, but in general, there seems to be an inverse proportionality between how good they are and the number of certifications they have. What's important is not what they know, per se, but how good they are at figuring out how to solve problems, especially during an emergency situation.