Uptime and Downtime

Comments

BBCoolThere was a pretty cool editorial posted on Slashdot on service downtime, citing BlackBerry outages over the last few months. The big question that the editorial asks is “why don’t we expect more?” It’s a tough nut to crack, but a lot of it revolves around the precedent set by service providers and available alternatives.

With RIM’s repeated service interruptions, we’re coming to expect on average an outage per month. It’s hard to not grow more jaded and complacent. Ideally, consumer expectations should shape the products and services we receive, but it’s largely the other way around. If there are no other options, customers will make do with what they have, and lower the bar in order to make the solution acceptable. In the corporate environment that BlackBerry dominates, alternatives aren’t there, especially when the modus operandi is handed down from on high. That works just fine; the extra accommodations for sub-optimal products can make them viable, but the value certainly lowers. It might be unfair to call something that’s up 99% of the time broken, but if it’s put next to something that’s functioning 100% of the time, the value difference is clear. By offering anything less than 99.999% uptime, you leave a gap open for competition.

Of course there’s lee-way – shit happens, right? No doubt the folks at RIM are doing their damnedest to make sure everything runs smoothly, but the more that external pressure for an excellent product and high standards dissipates, the more easily the exception can become the rule. If we aren’t expecting more, we’re expecting less, and the products we receive will reflect that. If innovation and reliable service are actually things that we as customers want, there have to be repercussions when those standards aren’t met, otherwise it’s just lip service. Those repercussions don’t have to be in the form of torches and pitchforks waved high, but anything other than resignation constitutes an opportunity for competition and better products all around.

No doubt letting outages go by without making a fuss is better for the blood pressure and the soul, and competition will always be pushing RIM to innovate regardless of the expectations of existing customers, but our standards will determine the pace of that innovation. If you want a better BlackBerry, just make some noise – RIM’s listening.

  • phlo

    Don’t forget the single most important reason: Monies.

    Guaranteeing 95% uptime of most anything is a rather easy task. React in less than 12 hours and you got your lower back covered for three outages per month. No need for extra staff and that kind of stuff – almost free.
    Raising the bar to 99% makes the thing quite a bit more difficult. Still, seven hours of reacton time at one outage per month is realistic even for motivated individuals with somewhat reliable software – some work necessary.
    Jack that up to 99.9% and it’s a bit more troubling. A workday of outages per /year/ isn’t that easy to accomplish; you’ll probably need someone on pager duty all the time and might want a second server to run your stuff on – both cost money.
    Skipping four and going right for five nines is a whole different beast. We’re talking about five minutes of downtime per year. A decade of uptime with less than an hour of unavailability. You aren’t going to accomplish this with a single guy, a pager and two or three servers; instead you’ll need competent staff all around the clock monitoring your redundant server clusters distributed throughout several continents connected to various tier 1 providers. If you’re not seeing a pattern, let me give you a clue about the implications of that: Money. Cold hard cash. You’ll not only pay your staff and their offices but your uplink providers’ staff, your datacenters’ staff, their respective bosses’ boss’ boni and whatnot.

    As a provider of anything, you’ll pass your expenses (eagerly) and savings (not so eagerly) on to your customers, creating what’s called a market. As such a customer, I, for one, usually don’t require five nines and will happily save 75% (or more) by reducing to 99.9% (or less). I’ll gladly trade a day of cell phone down time for another week of holidays and a day of not having a rental car available for a nice meal. There’s always landlines, phone booths, taxis and hitchhiking.

  • phlo

    Don’t forget the single most important reason: Monies.

    Guaranteeing 95% uptime of most anything is a rather easy task. React in less than 12 hours and you got your lower back covered for three outages per month. No need for extra staff and that kind of stuff – almost free.
    Raising the bar to 99% makes the thing quite a bit more difficult. Still, seven hours of reacton time at one outage per month is realistic even for motivated individuals with somewhat reliable software – some work necessary.
    Jack that up to 99.9% and it’s a bit more troubling. A workday of outages per /year/ isn’t that easy to accomplish; you’ll probably need someone on pager duty all the time and might want a second server to run your stuff on – both cost money.
    Skipping four and going right for five nines is a whole different beast. We’re talking about five minutes of downtime per year. A decade of uptime with less than an hour of unavailability. You aren’t going to accomplish this with a single guy, a pager and two or three servers; instead you’ll need competent staff all around the clock monitoring your redundant server clusters distributed throughout several continents connected to various tier 1 providers. If you’re not seeing a pattern, let me give you a clue about the implications of that: Money. Cold hard cash. You’ll not only pay your staff and their offices but your uplink providers’ staff, your datacenters’ staff, their respective bosses’ boss’ boni and whatnot.

    As a provider of anything, you’ll pass your expenses (eagerly) and savings (not so eagerly) on to your customers, creating what’s called a market. As such a customer, I, for one, usually don’t require five nines and will happily save 75% (or more) by reducing to 99.9% (or less). I’ll gladly trade a day of cell phone down time for another week of holidays and a day of not having a rental car available for a nice meal. There’s always landlines, phone booths, taxis and hitchhiking.

  • phlo

    Don’t forget the single most important reason: Monies.

    Guaranteeing 95% uptime of most anything is a rather easy task. React in less than 12 hours and you got your lower back covered for three outages per month. No need for extra staff and that kind of stuff – almost free.
    Raising the bar to 99% makes the thing quite a bit more difficult. Still, seven hours of reacton time at one outage per month is realistic even for motivated individuals with somewhat reliable software – some work necessary.
    Jack that up to 99.9% and it’s a bit more troubling. A workday of outages per /year/ isn’t that easy to accomplish; you’ll probably need someone on pager duty all the time and might want a second server to run your stuff on – both cost money.
    Skipping four and going right for five nines is a whole different beast. We’re talking about five minutes of downtime per year. A decade of uptime with less than an hour of unavailability. You aren’t going to accomplish this with a single guy, a pager and two or three servers; instead you’ll need competent staff all around the clock monitoring your redundant server clusters distributed throughout several continents connected to various tier 1 providers. If you’re not seeing a pattern, let me give you a clue about the implications of that: Money. Cold hard cash. You’ll not only pay your staff and their offices but your uplink providers’ staff, your datacenters’ staff, their respective bosses’ boss’ boni and whatnot.

    As a provider of anything, you’ll pass your expenses (eagerly) and savings (not so eagerly) on to your customers, creating what’s called a market. As such a customer, I, for one, usually don’t require five nines and will happily save 75% (or more) by reducing to 99.9% (or less). I’ll gladly trade a day of cell phone down time for another week of holidays and a day of not having a rental car available for a nice meal. There’s always landlines, phone booths, taxis and hitchhiking.

  • phlo

    Don’t forget the single most important reason: Monies.

    Guaranteeing 95% uptime of most anything is a rather easy task. React in less than 12 hours and you got your lower back covered for three outages per month. No need for extra staff and that kind of stuff – almost free.
    Raising the bar to 99% makes the thing quite a bit more difficult. Still, seven hours of reacton time at one outage per month is realistic even for motivated individuals with somewhat reliable software – some work necessary.
    Jack that up to 99.9% and it’s a bit more troubling. A workday of outages per /year/ isn’t that easy to accomplish; you’ll probably need someone on pager duty all the time and might want a second server to run your stuff on – both cost money.
    Skipping four and going right for five nines is a whole different beast. We’re talking about five minutes of downtime per year. A decade of uptime with less than an hour of unavailability. You aren’t going to accomplish this with a single guy, a pager and two or three servers; instead you’ll need competent staff all around the clock monitoring your redundant server clusters distributed throughout several continents connected to various tier 1 providers. If you’re not seeing a pattern, let me give you a clue about the implications of that: Money. Cold hard cash. You’ll not only pay your staff and their offices but your uplink providers’ staff, your datacenters’ staff, their respective bosses’ boss’ boni and whatnot.

    As a provider of anything, you’ll pass your expenses (eagerly) and savings (not so eagerly) on to your customers, creating what’s called a market. As such a customer, I, for one, usually don’t require five nines and will happily save 75% (or more) by reducing to 99.9% (or less). I’ll gladly trade a day of cell phone down time for another week of holidays and a day of not having a rental car available for a nice meal. There’s always landlines, phone booths, taxis and hitchhiking.