This week I was lucky enough to spend a day and a half in Lugano, Switzerland attending the first part of the Cray User Group. It was great to meet up with old friends from Cray, their partners, and their customers. I was legitimately attending not only as a partner but as a Cray user as my team at NVIDIA operates not one but two Cray systems, a Cray XK7 and a Cray XC30, both of course with NVIDIA GPUs, as part of our customer benchmark center. Our benchmark center also has hundreds of different x86 servers arranged into multiple clusters, all available to our customers for benchmarking.
The theme of this year’s Cray User Group, Back to the Future of Supercomputing, made me think back a decade to when I was responsible for the HPC benchmarking center at Sun Microsystems. At the time, the Sun center had some large (up to 64 socket) SPARC based systems, some small low power multicore/multithreaded SPARC Niagara (aka UltraSPARC T1) based systems, and a lot of different x86 based servers which Sun was just starting to build. It was a challenge then for our benchmarking team to find the best mix of systems to maximize performance for any given customer workload.
While already equipped with quite an heterogenous mix of systems and processors today, the NVIDIA benchmarking center promises to get a lot more heterogeneous over the coming year. Soon we will be getting our first OpenPower based servers with GPUs and after that I expect quite a few different ARM-64 based servers supporting GPUs. I keep telling my team of solutions architects that it is great job security because as the mix of systems available on the market increases, customers have an increasing need for trusted technical advisors to help recommend the best systems.
Of course the best system isn’t always easily defined. If you only consider two socket x86 systems with 16 DIMM slots, the best system is often defined as the lowest cost system. Of course even there you need to be careful to consider CPU speed and power, DIMM speed and power, power supply efficiency, networking capabilities, and cooling efficiency. And cooling efficiency also requires knowledge of the customer’s data center infrastructure. As increasingly does power supply efficiency. But at the same time, if you limit yourself to a two socket x86 server and 16 DIMMs, you aren’t going to get a lot of innovation in that space.
With IBM refocusing their server and HPC efforts on OpenPower with NVIDIA GPUs, and multiple ARM-64 based processors set to enter the market in the next twelve months, along with continued innovation in x86, and every processor vendor adopting accelerators of some sort, there is going to be a lot more innovation in the server space. While this also leads to more choices for customers, it doesn’t necessarily have to be more complicated. Talking to a customer from Lugano based Swiss National Supercomputer Center, CSCS, he stated their future system goals very succinctly, “kilowatts to job completion”. CSCS knows a thing or two about supercomputing, as they run Piz Daint, the largest supercomputer in Europe, which also ranks as the greenest petaflop supercomputer in the world according to the Green500 list. A Cray XC30, Piz Daint pairs one NVIDIA GPU to each x86 processor in the system to achieve its performance and energy efficiency.
Finally, to end my blog, a bit of a blatant recruiting pitch. As noted above, the NVIDIA customer benchmarking center is growing. That means we are looking for another HPC sys-admin. If the thought of getting to work with all the latest NVIDIA GPUs before they are announced, along with all the latest supercomputers from Cray and all our other systems partners, and getting a jump on learning new OpenPower and ARM based servers, get in touch with me. The position is based in our Santa Clara headquarters.