Is RIM in Need of More Redundancy to Prevent Outages?

13 Comments

rim-office

The recent BlackBerry outages have highlighted a sort of conflict of interest at RIM. On the one hand, RIM’s core customer base and competitive advantage, come from being the most secure smartphone on the market, and thus the business standard. On the other hand, RIM’s net new subscriber base is consistently coming from non-enterprise users, who care less about security, and more about apps and their smartphone lifestyle.

One of the underlying causes of BlackBerry outages, aside from carrier-side problems, is the architecture back at Waterloo. BlackBerry service for consumers is routed through RIM’s Network Operations Center (NOC), which gives RIM more control over encryption and security, versus a distributed solution. While a centralized network provides more security, it means catastrophic failures if you lose the NOC.

So should RIM be investing in more redundancies and a distributed network solution, or stick to the NOC architecture that made it so successful in the first place? Carmi Levy, a Canadian-based independent technology analyst and journalist has published his thoughts for Beta News.

Your thoughts?

  • DM

    My thoughts? Leave it in the control of the NOC, but invest in a multi routing solution. If they can handle BES and BIS as a split, then why not have multi routing? Have 4 chain routed systems which are each capable of holding the entire network. If they feel that an upgrade is needed, move everyone from 1 server and push the update… move some people back and see if there are problems…. if everything is clear, then move everyone over to server 1 and push the update to the other 3. If it didn’t work, then move the affected users back to server 2 and roll back the version on server 1. Yes, I know it is redundancy… but how can you lose with a system like that?

  • DM

    My thoughts? Leave it in the control of the NOC, but invest in a multi routing solution. If they can handle BES and BIS as a split, then why not have multi routing? Have 4 chain routed systems which are each capable of holding the entire network. If they feel that an upgrade is needed, move everyone from 1 server and push the update… move some people back and see if there are problems…. if everything is clear, then move everyone over to server 1 and push the update to the other 3. If it didn’t work, then move the affected users back to server 2 and roll back the version on server 1. Yes, I know it is redundancy… but how can you lose with a system like that?

  • http://www.builtbyrequest.weebly.com/ AlucardFair

    @DM, well put bud! I would rather RIM be redundant, than have the entire network affected. I depend more on my device for its stable content and security rather than apps. I do use apps, but mainly use my device for quick browsing and email. I can’t do that if the server is down. I say leave it in thr hands of the NOC and just upgrade it.

  • http://www.builtbyrequest.weebly.com AlucardFair

    @DM, well put bud! I would rather RIM be redundant, than have the entire network affected. I depend more on my device for its stable content and security rather than apps. I do use apps, but mainly use my device for quick browsing and email. I can’t do that if the server is down. I say leave it in thr hands of the NOC and just upgrade it.

  • http://caspan.com/ Caspan

    Kyle you got my brain going and you gave me inspiration to do my 4th installment of “RIM should have though of this first!” So to answer your questions since it’s long here is a link to my blog.

    http://caspan.com/?p=54

  • http://caspan.com/ Caspan

    Kyle you got my brain going and you gave me inspiration to do my 4th installment of “RIM should have though of this first!” So to answer your questions since it’s long here is a link to my blog.

    http://caspan.com/?p=54

  • http://caspan.com Caspan

    Kyle you got my brain going and you gave me inspiration to do my 4th installment of “RIM should have though of this first!” So to answer your questions since it’s long here is a link to my blog.

    http://caspan.com/?p=54

  • http://papogp.wordpress.com/ Diego Nei

    @DM. I agree completely.

  • http://papogp.wordpress.com Diego Nei

    @DM. I agree completely.

  • http://caspan.com/ Caspan

    @DM I agree with you but again we are all assuming they don’t already have this. As I mentioned in my blog entry the issue that happened during the past 2 outages no matter how many NOCs or redundancy RIM has this problem would have still occurred. the only solution to this is to allow users to use insecure sources other then their NOC to get data to the device. now the second you disconnect from the NOC you loose all NOC provided services like Windows Live Messenger, BBM, Google Talk etc.. all those great free services.

    One thing that I did not think of yesterday was the fact that how did BBM manage to take down their master database that has control to everything else? Maybe they should start segmenting and tiering their databases for applications so one major flaw cant effect the rest of the database servers running other services. But again I would assume they did have this but it was making calls back to a master database for user information or other info that is not stored in the BBM database which if you can imagine would just be like a DOS attack from their own application. Kinda hard to stop that kind of attack. It’s like the military satellites programmed to ignore our missiles but then we shoot one at ourself.. kinda hard without some reprogramming to now defend against yourself..

    I guess the programmers or network admins have never seen X-Files “Trust No One” it’s a security model I live by because a lot of the time the attack you expect will never happen but the one you don’t will!

  • http://caspan.com Caspan

    @DM I agree with you but again we are all assuming they don’t already have this. As I mentioned in my blog entry the issue that happened during the past 2 outages no matter how many NOCs or redundancy RIM has this problem would have still occurred. the only solution to this is to allow users to use insecure sources other then their NOC to get data to the device. now the second you disconnect from the NOC you loose all NOC provided services like Windows Live Messenger, BBM, Google Talk etc.. all those great free services.

    One thing that I did not think of yesterday was the fact that how did BBM manage to take down their master database that has control to everything else? Maybe they should start segmenting and tiering their databases for applications so one major flaw cant effect the rest of the database servers running other services. But again I would assume they did have this but it was making calls back to a master database for user information or other info that is not stored in the BBM database which if you can imagine would just be like a DOS attack from their own application. Kinda hard to stop that kind of attack. It’s like the military satellites programmed to ignore our missiles but then we shoot one at ourself.. kinda hard without some reprogramming to now defend against yourself..

    I guess the programmers or network admins have never seen X-Files “Trust No One” it’s a security model I live by because a lot of the time the attack you expect will never happen but the one you don’t will!

  • http://appworld.blackberry.com/webstore/vendor/1111 Ebscer

    I think breaking up their backend into smaller geography based networks would help. In the event that something like this happens again, it would be likely that only some areas would be affected. I would not prevent problems but would limit them to say California, instead of all of North America…

  • http://appworld.blackberry.com/webstore/vendor/1111 Ebscer

    I think breaking up their backend into smaller geography based networks would help. In the event that something like this happens again, it would be likely that only some areas would be affected. I would not prevent problems but would limit them to say California, instead of all of North America…