Forum OpenACS Q&A: Business grade OACS hosting/managed services?

I'm curious how other people are hosting OACS sites that require solid and consistent response time and uptime performance.  Does anyone have good experiences or recommendations?

Most of my opportunities seem to be with small to medium-sized clients who are not in a position to run their own data centers or even to have their own dedicated servers.  However, they still demand response time and uptime performance.

At the end of the day, they want their sites to be up and available and running fairly quickly (and I don't blame them).  But I've been struggling to deliver the goods.

I've tried running my own co-located server, and I found that to be too complicated, time-consuming, and resource intensive.  It seems like it may be the way to go if you have enough clients to achieve economies of scale, but I'm not there yet.  And I simply don't have enough time to be salesman, developer, AND sys admin.

I've tried shared hosting/managed services, and so far I've suffered real problems with performance and uptime.  My current host seems to be on a poor carrier/connection to the backbone, because my ping times are consistently in the 200 - 300 ms range with significant packet loss (pretty far upstream, not local to me).

I've also experienced a fair amount of downtime, for a variety of reasons.  Often, I'm sure it's a network problem,  which the provider might say is beyond their control.  But to my clients it still means their site is down.  I would pay extra to be on a good connection, if it were an option.

Have other people had the same problems?  Has anyone found a good solution?

Collapse
Posted by Torben Brosten on
Besides increasing reliability, I wonder to what extent small-time network operations can really prepare well against all attacks to specific business sites.

As far as I remember aD, it was founded to address 24x7 hosting and maintenance concerns.  In short, I remember it being explained as a set of growing, regional, economically separate teams that combined efforts to maintain servers/services 24x7, providing near enterprise level service.

I'd like to see openacs companies create a global alliance that provides these services. But in the realm of economic warfare, can we work together in a concerted effort like this, and thereby overcome an ongoing tragedy of the commons scenario (ie no world-class openacs offerings)? What would it take to form such an alliance?

We work with hub[1], which also hosts postgres sites. If you are considering co-location, XO[2] offers a multi-points of presence network with average cross network latency of 100ms max (last I checked), which was probably a reason why they are used for fast content delivery by some leading hi-tech internet companies. Things change so fast, don't know how current this stuff is, nor how relevent it is to your situation.

1. http://www.hub.org/vServer.php
2. http://xo.com/about/network/

cheers,
Torben

Collapse
Posted by Don Baccus on
Well, the good solutions still cost a fair amount of money, that's my thinking.  It's easy to forget, for instance, how expensive our software offerings are.  They're FREE of course, GPL-free, but they're also expensive ... VC capital expensive for the aD-designed part, AOL-profit expensive for the AOLserver part, etc etc.

You're going to get, within reason, what you pay for with hosting.

On the other hand, I'm somewhat curious about your experiences co-locating.  How much time does it take?  I have a couple of simple sites on my co-located server and I only touch it every three or more months.  I did my first panic reboot (i.e. kernel panicked) ever a few weeks ago ... first time I had to reboot the server to recover from a problem (in like 5-6 years).

If you've had problems, it might relate to some degree to the traffic you're running vs. your server, or network connectivity.

Where are you located?

Collapse
Posted by Mike Sisk on
I was reluctant to respond to this since I don't think blatant self-promotion belongs on these forums, but this is a subject that comes up from time to time and it's something we know a fair bit about.

Basically, furfly's core business is what you're asking about. We've been doing this for 5 years now and host high-bandwidth ACS sites for folks like The New York Review of Books, MIT's Archnet Project, and Edward Tufte.

We're expensive; our hosting prices start at $500 a month and up depending on needs.  But you can't provide enterprise-class service on the cheap.

Ok, enough self-promotion. Here's some things that in our experience is necessary for high-performance and high-availability hosting:

First, you gotta have a good network. It doesn't make any difference what kind of server hardware you have if your network is junk. You have to go up the food-chain as far as you can afford. Leasing space from your brother-in-law who is using space from an ISP that's using space from a bandwidth broker who's leasing from a real tier 1 host just ain't gonna work. If any of those folks in that chain can't make their monthly payment you're screwed.

We've been with Exodus (actually owned by Savvis now) since the beginning and deal directly with them. And while there's no guarantee that Exodus won't run out of money and lock the doors on any given day, there's as least some comfort knowing that if that happens the sites of folks like Yahoo!, Google, Slashdot, and Microsoft will go down, too. [Actually, the first time Exodus went into bankruptcy we were sent a memo that President Bush has signed off on Exodus being a "Important Infrastructure Utility" or something and that the US government would guarantee the continued operation of the datacenters.]

If you deal with a tier 1 host like Exodus, Level 3, or XO a lot of little problems go away, too. Power will always work no matter what (the Exodus datacenter a few blocks from Ground Zero continued to operate during 9/11), you'll have strong physical security, air conditioning and fire suppression. The actually network is likely to be good with multiple redundant connections.

In the 5 years we've been with Exodus (and several years of experience with them before we started furfly) we've never had a systemic power or network failure. None.

Now, after your network and physical space is taken care of you need to look at hardware.

First, you need a good network switch if you're not being provided one. And a spare. And these need to be enterprise-class since they'll be running and loaded 24/7. Cisco is good but we've been happy with the HP ProCurve series. Don't go cheap here and get a hub from CompUSA -- you'll regret it. If you have lots of money and need more stress in your life you can get fancy and expensive highly-redundant units that monitor each other. Otherwise keep a spare onsite that you can use if the primary fails. This is important as the switch is a single point of failure -- if it dies your network is off-line.

What servers you need really depends on your needs. We like the Dell rack mounts with redundant power supplies, internal SCSI disks on hardware RAID and running Linux. Sun is good, too. I'm running one Apple OS X server right now as a test -- it shows promise.

OS doesn't matter much, either, as long as it runs the software you need and you know how to manage it. Most of ours are Red Hat Linux, the newer systems using their Enterprise offerings. FreeBSD, OpenBSD, OSX, and Solaris will all do the job.

I like to keep things simple. You can do things like load-balancing, system-fallover prevention, and virtual servers but all these things add complexity -- you need to ask yourself how important these things are to the task at hand and if the added complexity is worth it. Photo.net recently upgraded for their quad-CPU Sun box to a new Dell machine with externally mounted RAID arrays. You can go over there and read how the added complexity is working for them.

Keep spare parts around, too. You never know when something minor like a fan will fail and cause a system to overheat and crash.

Monitoring is next. There is a whole range of products out there from the disk space checking scripts in Red Hat to more in-depth packages like NetSaint (now Nagios) or Big Brother. You should pick one, use it and have it send problems to a pager or cell-phone you give to your sysadmin.

Speaking of your sysadmin, the single most important thing is having one that knows what to do. The subject of what makes a good sysadmin is a subject for another time, but in general, there seems to be an inverse proportionality between how good they are and the number of certifications they have.  What's important is not what they know, per se, but how good they are at figuring out how to solve problems, especially during an emergency situation.

Collapse
Posted by Jesse Wendel on
I agree with Mike about everything he said, and especially about, "in general, there seems to be an inverse proportionality between how good they are and the number of certifications they have.  What's important is not what they know, per se, but how good they are at figuring out how to solve problems, especially during an emergency situation."

I manage 250 servers professionally for the largest non-municipal (NYSE:PSD) power company west of the Mississippi and north of San Francisco.  And when we screw up, the lights potentially go out over a third of Washington State.

When we're hiring - and I know, because I'm the person who is the first person to read the incoming resumes - the LAST thing we care about is what certifications someone has.  In fact, certain certifications actually count against you, or having too many certifications.  It tells me you're fluff, and not work.

The one thing we care about most is what actual experience someone has in a large datacenter, with demonstrated competency running projects and software similar or identical to ours.  If they don't have at least two years with at least 50 servers, we toss the application right then.

After that, we're especially looking for three things: 1. the ability to deliver the goods no matter what (accountability/ownership), 2. the ability to see the big picture, as in, after appropriate training, could the other senior team members all go on vacation for a month, and know that when they come back, things will still be running and everything will be okay?  Can they/will they always speak truth to power?  (integrity/responsibility), and 3. and this is always a deal-breaker, do they fit really well into OUR existing team.  There are only 8-10 of us at any given time on the operating systems team.  We have each other's back.  In the past, there have been a couple of times when we've hired someone who didn't quite fit in or thought s/he was too good for the rest of us; we now take exceedingly great pains to pick real team players.

A great sysadmin makes up for a lot of failures in the datacenter.  Not to say you don't want to choose a good datacenter to host your server.  But if you don't have chemistry with the team who is going to host your server, I'd look elsewhere.

Caroline mentioned earlier - https://openacs.org/forums/message-view?message_id=171458 - that she's moving a site from ETP to BCMS this weekend.  That's the site I'm producing.  We'll be going live later this evening, so in the next day or so, Caroline and I will announce what we've been up to the past 2.5 months, and invite y'all to come take a look.

In the meantime, I can tell you that I host the site at www.zill.net, and I've been very satisfied with Patrick Giagnocavo's service and performance.

Collapse
Posted by Walter Smith on
Thanks everyone for the replies.  It sounds like there's something of a market gap in the space between DIY/hobbyist-level hosting and enterprise-class managed services.

Torben, I like the idea of a global alliance to address this need, and I would be interested in participating.  I will have to think about that one a bit.

Don, I'm in Los Angeles, and I had a server in one of the big data centers downtown.  I spent some time setting it up to be secured and stable, and at first it was running fine without much intervention.  Then I had my first security incident.

I was told that my server had consumed a couple hundred dollars worth of extra bandwidth, but it wasn't from any of the sites or services I had running.  I still don't know what happened, although I eventually concluded that someone may have actually physically plugged into my port and hijacked my bandwidth.  Fortunately they decided not to charge me for the overage.  But I had spent an enormous amount of time investigating the issue and subsequently monitoring traffic and bandwidth consumption.

All along I felt like I wasn't properly maintaining the box, because I didn't have time.  Out of necessity I was following the "if it ain't broke don't fix it" management philosophy, and it felt pretty risky.

Mike, I don't think I'm quite in your price range yet, although it may come to that.  I guess I'm hoping to find more of an SMB (small to medium-sized business)-class alternative.  So far my sites are pretty low-traffic/bandwidth, but they need better performance and reliability than I've been getting.  Are your services/prices based on dedicated servers?

Does anyone have experience running on ODSOL's VPS service?  I'm not familiar with virtual server environments, so I'm not sure if the RAM/CPU allocations are adequate for a production OACS site.