Two years ago the European HPC community surprised many in the HPC world by announcing the Mont-Blanc Project, an approach towards energy efficient high performance using ARM processors. Since then, many major server vendors have announced plans for ARM based servers, including HP’s Moonshot System, with energy efficiency typically as one of the major selling points. However, many in the HPC world have been waiting for the upcoming availability of 64-bit ARM processors before starting to experiment with ARM for HPC applications. That makes the results reported earlier this month by a team at the Large Hadron Collider (LHC) at CERN in Geneva all the more exciting.
According to this article, a team of LHC researchers ported the entire CMS software stack, including 125 external support packages, to an ARMv7 (32 bit) based system. The results: an amazing 4x the events/minute/watt compared to two reference Xeon x86 systems. While many smaller experiments and test results on ARM have been completed, few if any experiments on software systems as large as CMS have previously been reported. In fact the only software the CERN team reported not being able to run on ARM was some Oracle libraries, although they noted, “no standard Grid-capable CMS applications depend on Oracle”.
A number of vendors have announced plans to ship 64-bit ARM processors over the next 12 months and the availability of those processors should spur ever more HPC work on ARM. At the same time, Intel is not standing still and continues to improve the energy efficiency of the Xeon processor. But ultimately, due the the very laws of physics that engineers at CERN study, the two-socket processor is headed to the Computer History Museum.. Today’s modern processors using on the order of 20 pico joules (pJ) of energy for a 64-bit floating-point operation. A 256 big on-die SRAM access uses about 50 pJ. But an off-die link, even an efficient one like you might use to connect the processors in a two-socket server, consumes on the order of 500 pJ. Increasingly, HPC architectures, whose design was for decades dominated by optimizing floating point performance, will need to focus on minimizing data movement. Future HPC systems are likely to be at the forefront of single socket server adoption, be they ARM or x86 based, in the years ahead.