SC10 Preview

For HP fans, SC10 starts on Thursday evening November 11th with the
Hewlett-Packard Consortium for Advanced Scientific and Technical (HP-CAST) user group meeting. The theme for our user group meeting is Energy Efficient Clusters from Mid-range to Peta-scale systems and I thought I would say a few words on that topic today.

One of the highlights of HP-CAST will definitely be Professor Satoshi Matsoaka from Tokyo Institute of Technology (Tokyo Tech) sharing early results from the TSUBAME2 supercomputer based on the recently introduced HP ProLiant SL390s G7 Server. The SL390s is purpose-built for energy efficient HPC computing. The basic building block of the SL390s is a two socket Intel x5600 “Westmere-EP” processor. But that is where any comparison to other servers ends.

Recognizing the ever increasing number of applications being optimized for GPUs, HP engineers packed up to 3 Nvidia “Fermi” 2050 or 2070 GPUs into the SL390s for industry leading performance/watt and price/performance. Of course all that performance does you no good if you can’t get data into and out of the node. So HP added a Mellanox ConnectX2 networking chip onto the SL390s motherboard. The ConnectX2 is a dual mode chip supporting either low latency 10G ethernet or 40G InfiniBand.

The SL390s comes in several different configurations letting you custom design each node for optimum performance on your specific mix of applications. The basic model (no GPUs) packs eight 2-socket servers into the SL6500 chassis which takes up just 4RU (rack units) of space. The GPU version packs four 2-socket servers, with a total of up to 12 GPUs into the same SL6500 chassis. No other x86 server provides more processing power, I/O, and memory in such a dense package.

HPC clusters built out of the SL390s can deliver up to 1 TeraFlop of peak performance per RU. Of course, peak performance is only that, a
theoretical peak. What is much more important is actual sustained performance. While sustained performance varies greatly by application, HP has demonstrated over 75% efficiency (sustained performance/peak performance) across multiple SL390s. We believe even small clusters built out of a few SL390s nodes will enable substantially greater advances in science as these new levels of performance are delivered at a price point (and performance/watt level) never achievable in the past.

As HPC experts recognize, as you build larger clusters it is more difficult to achieve the same level of efficiency. However, because the SL390s delivers a much more balanced combination of compute and I/O than first generation GPU compute nodes, we expect the TSUBAME2 cluster to significantly raise the bar for efficiency on GPU clusters. HP customers attending HP-Cast will be among the first to hear the official TSUBAME2 Top500 efficiency results.

Performance and efficiency are critically important at TSUBAME2 scale, but HP’s view on getting to sustainable, repeatable ExaScale computing doesn’t just involve building 1 or 2 supercomputers used by a few researchers. I actually applaud Tokyo Tech for giving each of their more than 10,000 undergrad students an account on the TSUBAME2 system so they can start developing supercomputer-scalable codes early in their higher ed career. What we are seeing, however, is that with the advances in performance/watt and price/performance enabled by GPUs, that supercomputer is now reaching out to more and more places than ever before. A great example of this is the research work of
Professor Lorena A Barba at Boston University, home of the Panamerican Advanced Studies Institute. This coming January, in Valparaiso Chile, the institute will sponsor a conference and training seminar on
Scientific Computing in the Americas: the challenge of massive parallelism. And after the seminar, when she leaves her hometown of Valparaiso and returns to BU, Professor Barba can comfort herself in the knowledge that her fellow researchers in Chili can continue developing new codes and applications for ExaScale computing, without needing a specialized $1M+ supercomputer, but instead on just a few SL390s delivering over 1TF per rack unit, a level of supercomputing that was just never available before in many parts of the world.

So that is my preview of HP-CAST for today. In a nutshell, Energy Efficient Clusters from Mid-range to Peta-scale systems. From Chili to Japan, Germany to New Orleans, HP is enabling new levels of access to supercomputing power, as well as powering some of the world’s fastest supercomputers, with the new purpose-built SL390s HPC compute node.

About Marc Hamilton

Marc Hamilton – Vice President, Solutions Architecture and Engineering, NVIDIA. At NVIDIA, the Visual Computing Company, Marc leads the worldwide Solutions Architecture and Engineering team, responsible for working with NVIDIA’s customers and partners to deliver the world’s best end to end solutions for professional visualization and design, high performance computing, and big data analytics. Prior to NVIDIA, Marc worked in the Hyperscale Business Unit within HP’s Enterprise Group where he led the HPC team for the Americas region. Marc spent 16 years at Sun Microsystems in HPC and other sales and marketing executive management roles. Marc also worked at TRW developing HPC applications for the US aerospace and defense industry. He has published a number of technical articles and is the author of the book, “Software Development, Building Reliable Systems”. Marc holds a BS degree in Math and Computer Science from UCLA, an MS degree in Electrical Engineering from USC, and is a graduate of the UCLA Executive Management program.
This entry was posted in HPC. Bookmark the permalink.