To PUE or not to PUE? Is that the question?
"OMG!" I hear you say! Not another blog wanting to debate the pros and cons of PUE!
I'm not writing this to re-open (did it ever close?!) the debate about PUE. I'm here to talk about how those of you who use it today to track your data center performance can greatly improve its value to you and your business.
But first a little history.....
Many years ago in a land far far away...well it wasn't that far actually; it was Milan in northern Italy. Three guys sat around a dinner table chatting about how the data center industry just needed to start measuring something simple that gave an indication of how efficiently data centers were using energy.
I was one of the three along with Liam Newcombe (my CTO) and Christian Belady (Mr Data Center at Microsoft) and we'd just spent the day together at the European Commission's Joint Research Center (JRC) in Ispra, Italy, which is a very impressive campus where real science happens funded mostly by the EU member states.
Actually while it's an impressive site, because of the large number of non European attendees the meeting didn't take place inside the JRC, but rather the big meeting hall above the JRC tennis club just outside the high security fences of the JRC grounds themselves.
Christian had done a good job at that meeting of pitching the use of PUE to be used within the European Code of Conduct for data centers.
Luckily, Christian and Liam (who was the primary author of the original code and its best practice guidelines at the time) saw eye to eye about the use of PUE. It was the first time they'd met but it was clear to me (being mostly a spectator during much of the conversation that transpired over dinner) they were both cut from the same cloth.
With the might of the Green Grid and many vendors behind it, PUE went on to become the de facto metric for representing data center infrastructure efficiency (can you spot the irony there?).
Today many people spend many hours of their lives trying to explain to others in their company, usually the senior ranks, why the data center PUE getting "worse" (becoming a larger number) was not necessarily a "bad" thing and didn't necessarily mean they'd not done their job properly in terms of looking after the data center.
The problem with PUE (and clearly the industry knows that PUE is far from a perfect metric) is that using the absolute PUE number to track the performance of a site only tells part of the story.
"We know this already" I hear you say...
Yes, you already know that without asking what the corresponding utilization is, you can't really take a view on whether the number you have I front of you is good, bad, indifferent and whether it can or should be improved (we already know that every data center reaches an inflection point where you start trading TCO for PUE) and in an economized data center, you'd also be well advised to ask what the climate did too.
Most organizations today target their data center managers on an absolute reduction of their site's PUE but is that really a good plan? How do you know when you’ve reached that inflection point and while you might continue to shoot for as low as you can go, you’re unknowingly targeting the DC manager to increase your overall Total Cost of Ownership!
Also looking at the absolute PUE value is a bit of an unfair measure for the site manager, because generally the site managers have no control over what happens with the IT load; servers come in, go out, their level of utilization fluctuates, etc. We already know that improving server utilization through virtualization and consolidation for example will often make your PUE worse, due to the total IT load going down, which for any Enterprise IT operator is a good thing but for a colo operator it's generally a bad thing.
In an economised site the PUE will vary significantly with the outside temperature. I've yet to meet a DC manager than has any control over their local climate so a particularly warm year may mean it's simply impossible meet their PUE reduction target for the year.
Dynamic setting of PUE targets and tracking performance against them
With the introduction of predictive system level modeling for data centers (and no I don’t mean a CFD model) it is possible to build a highly calibrated (98% calibration accuracy) model that will allow you to do a number of things:
- Verify that your data center is performing at its most optimum PUE given the way it’s been designed and built and with the load and climate it’s operating with.
- Where it's not operating at it’s optimum PUE the model is able to show you why as well as where and how to improve it.
- Using the metered data from the site and continuously feeding the actual climate data and actual IT load, the calibrated model of a now fully optimized data center will continuously and dynamically tell the site manager what the PUE "should be” if everything is working as expected - something we call the "expected PUE".
Now with this dynamically calculated “Expected PUE” to compare against the actual PUE, the target for the site manager should be to keep the “Expected vs Actual PUE” within an acceptable tolerance; remember the expected PUE will automatically adjust itself for variation in IT load and climate so it’s a fair and equitable target and more appropriately represents the actual domain of control that a site manager can impact.
Now of course you may say "well I could still improve the PUE by making more impactful changes" and you’d be right, whether it’s increasing set points, changing to a different control strategy or upgrading to more efficient drives or equipment, all of these things could well improve the site's PUE.
Another benefit of having a calibrated predictive model is that you can now rapidly try out all the different things you might do to your site to improve it’s PUE, and if the model is capable of modeling cost as well as PUE, then you can make some really well informed decisions about what actions you might take to reduce the absolute PUE of your site, but not going past that TCO vs PUE inflection point without knowing so.
Don't sit back and think you’ve just got to live with being beaten regularly with the internal PUE stick! There is a much smarter, more significant and valuable way to use this important industry metric that will help you manage and reduce PUE using meaningful and achievable targets that take account of all the variables that impact the site’s performance.
Zahl Limbuwala, CEO of Romonet