HPC and other Hyperscale customers should be really excited about Intel’s launch today of the Intel® Xeon® Processor E5-2600 Product Family. The Register has a nice article detailing some of the features of the new processor including this diagram of the processor.
Of particular interest to HPC customers should be Intel’s new AVX units on the cores which can do two floating point operations per cycle, twice that of existing Xeon X5600 processors. So while the FLOPS aren’t exactly free, you could say half the FLOPS are free compared to current Xeon processors. Not bad.
Look closer at the above chip diagram through. While the functional blocks are not drawn 100% to scale, you will still notice that the execution units, which include the AVX logic, actually take up a relatively small part of the processor. Much of the processor die is actually taken up by L1 and L2 cache and associated logic, out of order scheduling, and other advanced features that give the processor its overall performance and allows AVX to actually deliver double the number of FLOPS. On the Xeon E5-2600 HPC cluster that HP delivered to Purdue University back in October 2011, AVX in fact boosted Linpack performance from 149 GFLOPS/node (measured with AVX turned off) to 294 GFLOPS/node with AVX turned on and Intel’s MKL math library. That is pretty darn close to double.
The Purdue “Carter” system’s Top500 HPL score of 186.9 TF, listed ast #54 on the November 2011 Top500 list used only 257 KW of power which at the time was a record for a non-accelerated x86 system. Of course, even better performance/watt is possible using acceleration technology such as Nvidia’s Tesla GPUs, and upcoming Intel MIC and AMD Fusion APU technologies. So while a lot of hardware and software effort will continue to go into increasing the FLOPS/Watt of a processor, as you can imagine by looking at the block diagram above, an increasing amount of power in future general purpose processors like x86 will go to cache and other functions which I would group into “data movement” operations versus FLOPS operations.
Intel’s MIC compilers and technologies like OpenACC from Nvidia are likely to continue to improve the FLOPS you can get out of your floating point hardware, but even the best compilers can’t extract parallelism from code if the underlying algorithm is serial. The new challenge for software architects thus is rapidly changing from worrying about FLOPS to worrying about minimizing data movement (what all those other parts of the processor are mostly doing) which inherently requires you to think about the parallelism of your algorithm.
But for today, congratulations to Intel on their launch, with AVX and a host of other new HPC improvements, the E5-2600 is going to be a great processor for HPC. And of course, when coupled with HP’s new ProLiant Gen8 servers, and HPC networking, storage, and management software from HP, you have one of the world’s most self-sufficient and powerful HPC solutions. While HP announced a whole range of new Gen8 servers powered by the E5-2600 processor today, two of the specific servers designed from the ground up for HPC include the SL230s Gen8 and SL250s Gen8. A good place to start to learn more about the SL230s and SL250s is to click on “learn more” under the “See the Portfolio” banner on the ProLiant Gen8 launch page. HP has already shipped 1000’s of SL200 series servers as part of Intel’s “early ship” program including the 648 SL230s systems in the Purdue Carter cluster.