HPC Design Challenges

Last week I had the opportunity to meet with a number of HP engineers as well as several of our HPC partners and several startups and discuss some of the design challenges of future HPC systems. Achieving Exascale performance by the end of the decade will require many new technologies to be developed, such as those being researched by the HP Labs Intelligent Infrastructure project. But even by the middle of the decade, HPC systems are likely to look quite different from today. Here are a few of the potential changes.

Power and cooling continue to gain importance in HPC as well as in other hyperscale environments such as the mega data centers being built by today’s social networking, cloud, and search companies. Computer systems engineers, typically with a background in electrical engineering or computer science, are today being required to think about the basics of plumbing and advanced thermodynamics. While many different advanced cooling systems have been demonstrated by vendors over the last several years, few if any are ready to be economically deployed today at the scale of a 10,000 server HPC system much less a 100,000 server mega data center. Promising techniques being worked on today in the industry include cooling servers with room temperature water, vs the typical chilled water cooling, as well as heat re-use. Systems such as the CLUMEQ supercomputer demonstrated the potential for heat reuse several years ago, and the challenge is now to do this at the rack level with industry standard components.

Faster storage. Sure, I can scale a Lustre parallel file system solution to dozens of PBs and more and more HPC centers are looking at distributed technologies like Hadoop to solve the “big data” challenge, but what other fundamental technology changes are likely to impact HPC storage over the decade? Near term, there are many startups working to build higher performance solutions out of SSD-Flash technology. The short term (1-2 years) advances here are not going to be in revolutionary flash technologies. Flash roadmaps are well understood and seeing evolutionary improvements. However, the looming mainstream introduction of PCIeGen3 server interconnects and new PCIeGen3 flash controllers offers interesting possibilities. To take advantage of the storage bandwidth possible with PCIeGen3 will require rethinking the software interface. Strip away legacy storage protocols (FC, SCSI, SAS, etc.) and even perhaps the file system and you now have real possibilities. Longer term, today’s flash technology will give way to fundamental new memory technologies, such as HP’s memristor which promise to provide not only new levels of performance but significantly lower power usage.

Faster networking doesn’t just mean moving from 10G to 40G ethernet or QDR Infiniband to FDR Infiniband, but addressing the management and scalability of today’s HPC networks. It is going to be an exciting decade for networking as the proliferation of “merchant silicon” for high speed networking from the likes of Intel/Fulcrum, Mellanox, Broadcom, and others enables new startups to take on the industry giants just as x86/Linux took on proprietary server vendors a decade ago. Of course with a level playing field for the hardware COGS of a switch, my technology bet is on the players that bring differentiated software to the table. As a networking switch becomes not much more than an x86 or other industry standard processor and a commodity networking chip, the boundaries between servers and switches will become increasingly fuzzy. Is that your server acting like a switch or is it your switch running apps?

Add to the above list alternative, more power-efficient, processors and their programming models and HPC system designers have plenty to keep them busy for many years to come. A few recurring themes though echoed throughout the week. First of all, its hard to bet against open industry standards in the long run. Second, scale matters. Winning technologies are going to be the ones that can be used throughout the industry vs one-off specialized systems that are only used by 1 or 2 customers. Finally, there is lots of room for innovation, be it at the world’s largest technology companies or 25 person startups. One thing however hasn’t changed, from the days of the first Cray-1 supercomputer, to today, the HPC industry has been an exciting one to be in as it pushes ahead the leading edge of technology.

Advertisements

About Marc Hamilton

Marc Hamilton – Vice President, Solutions Architecture and Engineering, NVIDIA. At NVIDIA, the Visual Computing Company, Marc leads the worldwide Solutions Architecture and Engineering team, responsible for working with NVIDIA’s customers and partners to deliver the world’s best end to end solutions for professional visualization and design, high performance computing, and big data analytics. Prior to NVIDIA, Marc worked in the Hyperscale Business Unit within HP’s Enterprise Group where he led the HPC team for the Americas region. Marc spent 16 years at Sun Microsystems in HPC and other sales and marketing executive management roles. Marc also worked at TRW developing HPC applications for the US aerospace and defense industry. He has published a number of technical articles and is the author of the book, “Software Development, Building Reliable Systems”. Marc holds a BS degree in Math and Computer Science from UCLA, an MS degree in Electrical Engineering from USC, and is a graduate of the UCLA Executive Management program.
This entry was posted in HPC. Bookmark the permalink.

3 Responses to HPC Design Challenges

  1. Pingback: Marc Hamilton on HPC Design Challenges | insideHPC.com

  2. Alan Morris says:

    I agree with your assessment and would suggest that the complexity of today HPC architecture and the rapid development of standard interface technology will begin to allow designers to collapse what were once individual commands into a streamlined processor set. This will provide a means to collapse many of the protocols and processing steps into standard routines. The innovation cycle is alive and well in HPC. The benefits to us all are just now being understood.

  3. Thanks Alan for your comment, I couldn’t agree more. We already saw Lustre drive a wave of innovation in the storage market, providing simplified, higher performance, and lower cost storage to the HPC market versus more traditional approaches to storage such as SANs. The benefit of Lustre being open source is that multiple companies can innovate around the technology and I’m certainly looking forward to where Lustre goes in the future especially around utilizing flash technology to improve performance.
    Marc

Comments are closed.