By Andrew Jones.
There are many names for it. There are many descriptions of it. There are many factors that contribute to it. But no matter what you call it, how you paint it, or what you think causes it; the issue remains: it is difficult to predict when your job will complete in a public cloud environment.
This problem is the topic of many academic and corporate research projects, but from what I have seen no one has solved it. For this discussion, I am strictly referring to uncapped CPU (or CPUs) of virtual machines. In most cases, public cloud offerings provide uncapped virtual machines (VMs). Uncapped VMs take advantage of unused CPU cycles of a physical node by allowing a busy virtual machine to use additional available cycles. This way allows customers to get more than they are paying for.
The negative aspect of this approach is that the performance predictability of a job is near impossible to determine in this type of environment. A job run today might complete in two hours, but the exact same job might take four hours tomorrow. The typical cause for the difference is other VMs doing real work on the same physical node as the job in question. These could be the same customer’s jobs on the same VM, the same customer’s jobs on another VM collocated on the same physical node, or another customer’s jobs on another VM collocated on the same physical node. The latter two of these are often referred to as “your own worst enemy” and the “noisy neighbor” problems of cloud computing.
Does this mean I’m being cheated by my cloud provider? The answer is no. Cloud providers usually promise a minimal rating for CPU performance; however they often deliver a lot more. In essence, customers typically are getting a lot more than they are paying for. From a provider’s stand point, however, the hidden danger of operating in this manner is that customer becomes accustom to a level of performance that in some cases far exceeds the minimum being promised. Dissatisfaction arises when customers notice that their jobs are running much slower than they did the month before.
If you have a job that needs to complete by a certain time (for example, time-sensitive quarterly report) there are several things you can do to get the most out of your cloud with the goal of having your job complete in a predictable time frame. By no means is the list that follows an exhaustive one. As I mentioned earlier, there is a lot of research in this area and I am sure much brighter minds are out trying to solve the problem as I type this and as you read it. Your mileage will vary, but depending on your job, the tools you are using, and the underlying architectural and business design of your cloud provider, I hope you can find one or two of these ideas helpful.
And if all else fails, determine what is the worst-case scenario for your job. This may be hard to determine depending on what your provider publishes on the physical architecture of its cloud environment. But if you can determine the worst-case performance and your job is predictable from run to run, you should be able to work backward from your job deadline to determine when you must start it in order to complete on time.
As mentioned earlier, this list is definitely not exhaustive. What do you do? What would you suggest?
Subscribe to the blog to receive updates about:
AltaFlux understands what you and your organization need to excel, and can deliver rapid innovation to unleash your full workforce potential. Together, we can empower your business by streamlining, transforming, and optimizing your key HCM and talent processes with industry-leading SAP SuccessFactors technology—enabling you to adapt at the speed of change.
AltaFlux Corporation is a global HCM cloud consulting partner based in Troy, Michigan. We empower organizations by streamlining, transforming, and optimizing key human capital management (HCM) processes with industry-leading HCM cloud solutions like SAP SuccessFactors, Benefitfocus, WorkForce Software and Dell Boomi.