The CPU is the hottest part of any computer, and they’re getting hotter with each succeeding generation. Why is this the case? An easy way to think of a CPU is as a huge set of interconnected roads, each carrying electrons to their appointed destination where they flip transistors to 1 or 0.
As CPU fabrication has become more sophisticated, the distance between the roadways on the CPU has shrunk from over 100 nanometers (nm) to today’s 20 nm or less. This allows chip manufacturers to add more transistors. In fact, the transistor count in the newest SPARC processor, the SPARC64 XII, is more than 5x greater than in the SPARC64 VII+. However, more transistors closer together increase heat per square centimeter.
Another factor in the heat output formula is the frequency of the chip. To use our roadway analogy, this is the speed at which the cars (electrons) are speeding on their various paths. The higher the frequency, the higher the heat the chip produces. The aforementioned SPARC64 VII+ CPU had a 3.0 GHz frequency while the current SPARC64 XII can run up to 4.25 GHz. This provides more than a third greater performance, but at the cost of significantly higher heat output.
High CPU heat is a bad thing. When the CPU gets too hot, it will slow down its frequency in order to bring the temperatures down. Heat also limits system density, since systems need more airflow in order to keep temperatures in control. Finally, heat is the enemy of durability – hot CPUs (and other components) have a shorter lifespan.
Cooling Solutions Contrasted
The vast majority of CPUs today are air cooled with large (and getting larger) heat sinks perched on top of the chips. These large heat sinks aren’t all that efficient, even though they have fans and even liquid tubes inside of them.
These air cooled CPUs dump their heat inside the data center. This heat has to be removed, usually through air conditioning. The typical data center has to be kept at a lower temperature than the rest of the facility just to ensure that the systems don’t bake themselves into a failure.
Air cooling is an expensive proposition. Estimates of annual data center environmental costs range from 20% – 40% of total data center spending. Liquid cooling is a much better alternative to air cooling. Liquid carries approximately 25x more heat than air, meaning that it is a much more efficient cooling medium.
There are a couple of different ways to implement liquid cooling for CPUs. The first is to attach liquid cooling blocks to each CPU and circulate liquid through the CPU and into a heat exchanger, usually located at the bottom of the server rack. This method requires that the data center has a source of chilled water to carry the heat out of the data center. It also requires data center personnel to attach water blocks to the CPUs and to plumb the heat exchanger into the data center cold water source.
Another method is to completely immerse the server into a trough of liquid – typically mineral oil or some other non-conductive liquid. This dissipates a very large amount of heat per square centimeter, and is the most efficient cooling technology available today. However, it comes at quite a cost. Servers have to be ‘liquid proofed’, drives need to be sealed, and vertical racks need to be replaced by horizontal liquid troughs. Data centers still have to have a cool liquid source in order to cool the mineral oil or whatever other substance is used as a heat vehicle.
A third method is the kind of cooling that is seen on the Fujitsu M10 server. It’s a liquid air hybrid that provides high cooling, but doesn’t have the downsides associated with traditional liquid cooling. As can be seen in Figure 1, there is a hollow cold plate (labeled ‘container’) that attaches to the CPU and circulates cool liquid inside the plate to keep the CPU temperature in check. This is a completely closed loop, meaning that there is no maintenance, and no need for additional data center plumbing. The system is factory installed and tested, and has no connectors, which means that there is almost no chance of leakage.
The Fujitsu M10 is a great example of closed-loop single-phase liquid cooling. However, the Fujitsu SPARC M12 does it one better with highly advanced two-phase liquid cooling, called VLLC (Vapor and Liquid Loop Cooling).
The difference between single-phase cooling and two-phase cooling is that in single phase cooling, the liquid (usually treated water) remains a liquid during the whole cooling cycle. With two-phase cooling, the liquid flashes to steam when it touches the hot CPU, and it’s the liquid vapor that carries the heat away from the CPU. Vapor carries a lot more heat than liquid – depending on the liquid – as much as 30-100x more. The liquids used in two-phase liquid cooling are highly engineered. They have a much lower boiling point than water, meaning they turn into vapor at lower temperatures and can thus keep a CPU well below the rule of thumb maximum CPU temperature of 60 degrees Celsius.
Figure 4 shows how Fujitsu has implemented two-phase VLLC on the Fujitsu SPARC M12.
As you can see in the picture, the heat from the SPARC64 XII causes the engineered liquid to boil and generate vapor. This vapor travels up the red side of the loop, where it is hit by cool air blown by system fans. This air flow cools the loop and causes the vapor to condense back into a liquid, and it then travels back to the CPU to begin the process all over again.
Fujitsu’s VLLC is factory installed and requires no maintenance from the user. It also doesn’t require any additional facility plumbing, just the normal air conditioning that is already present in the data center.
With VLLC and adequate data center cooling, the Fujitsu SPARC M12 system can run at as high as 4.25 GHz, which is a considerable improvement over preceding CPUs. This extra performance translates to faster workload processing, quicker application completion, and better service to users. Those tiny bubbles make a big difference when it comes to cooling.