Original Post From Bert Markgraf
Some customers watching Netflix on Christmas Eve saw their service deteriorate as their devices started to buffer the signal, trying to keep the video feed fluid. Eventually many screens froze due to cloud outages in the Amazon Web Services data center in Virginia. Netflix had most of its service restored by late Christmas Eve, and Amazon reacted quickly to solve the remaining problems with its system by Christmas Day. Netflix says they had purchased extra redundancies for their service, but the problem was with the load balancing rather than with the cloud service itself. While cloud systems from major vendors have a good availability record, companies using the cloud have to plan for rare cloud outages and focus on how quickly suppliers can fix problems when they occur.
In this case, Netflix and Amazon reacted rapidly and customers faced only a few hours of downtime. TheNew York Times covers the outage in more detail and puts it in perspective with other recent cloud failures. The problem at Christmas was apparently that the system saw an overload. Amazon has notifications on the company website when there are problems with web services and, on December 24, 2012, the notifications indicated trouble with the Elastic Load Balancing. This function reacts to high traffic levels and balances loads among multiple servers to prevent slowdowns and overloading. When it doesn't work or acts incorrectly, servers may try to handle too much traffic. Services such as Netflix slow down to such an extent that they seem to stop working. Then customers can't see their streamed movies.
Midsize businesses are embracing the savings, high computing capacities, and low storage costs of cloud services and infrastructure, but these systems are highly sophisticated. The more complex a system, the more likely it is to experience occasional unanticipated outages due to circumstances that the supplier did not anticipate. While in-house IT services also suffer occasional disruption, the key difference is that companies carrying out troubleshooting on their own equipment will have some idea of progress and a timeline for the reestablishment of their service. For many cloud services today, such information is not available. Providers of cloud services focus on fixing the problems and often any information that is available is unreliable. Suppliers are hesitant to give out information that may change as additional factors become known and estimates for full service resumption change.
For companies looking at transitioning critical functions to cloud services this represents a problem. Ideally, competent suppliers would be confident enough to tell key customers exactly what they are doing to address service disruptions. As the situation changes, they can issue updates. In the absence of structures to disseminate such information, many companies will hesitate before putting critical functions in the cloud. They will free up data center capacity by outsourcing less important operations, but keep key company functions in house where they have firsthand information about any problems.
Subscribe to the blog to receive updates about:
AltaFlux understands what you and your organization need to excel, and can deliver rapid innovation to unleash your full workforce potential. Together, we can empower your business by streamlining, transforming, and optimizing your key HCM and talent processes with industry-leading SAP SuccessFactors technology—enabling you to adapt at the speed of change.
AltaFlux Corporation is a global HCM cloud consulting partner based in Troy, Michigan. We empower organizations by streamlining, transforming, and optimizing key human capital management (HCM) processes with industry-leading HCM cloud solutions like SAP SuccessFactors, Benefitfocus, WorkForce Software and Dell Boomi.