The Last ExaFLOP Supercomputer

Few people question that an ExaFLOP supercomputer will be built in the coming years. But rather than write about the first ExaFLOP system, I thought I would ponder the last one.

If you believe HPC veteran Thomas Sterling, Professor of Informatics & Computing at Indiana University, quoted in his recent interview, ‘I Think We Will Never Reach Zettaflops’, one could postulate that the last ExaFLOP system will never be built.

But with all challenging problems, it is good to get more than one opinion. The title of Greg Papadopoulos’, recent talk, How to Design and Build Your Very Own Exascale Computer makes ExaFLOP computing sound almost trivial. Albeit Greg expands the concept of ExaFLOP beyond pure supercomputing to ExaSCALE, including large systems such as those that Google, eBay, or the HP Cloud may run in the future.

No matter what you think about the future of ExaFLOP and ExaSCALE computing, the experts pretty much agree, improving the performance/watt will be one of the key challenges. Using today’s commercially available technology, an ExaFLOP system would require about 1,000 megawatts of power. That is about 50x more than the US Department of Energy’s 20 megawatt target for future ExaFLOP systems.

For those of you who have difficulty conceptualizing 20 or 1000 megawatt supercomputers, here are a few reference points, courtesy of @datacenter. One of the largest data centers I have seen written about is the new CyrusOne 110 megawatt data center going up in Phoenix. CyrusOne is a co-location company, they don’t built data centers for their own servers but rent out space to other companies. So almost by definition the CyrusOne center will house multiple computers, not a single 1/10 Exascale system.

Even when we are able to build ExaFLOP systems that operate on 20 MW, that is still a huge power bill, and most likely a big environmental impact, at least if operating on non-renewable energy sources including the relatively low cost coal fired power driving some of today’s largest data centers. Luckily, those who use big data centers are starting to care about the environment a bit more and today we see multiple data centers using solar and other renewable power for 2-20 megawatts of their needs. Still, data centers are estimated to be using 10% of the US energy today.

Of course, besides addressing the supply side, technology vendors are everyday working to attack the demand side of data center power usage, from the basic processing chips to the network to cooling systems. Not surprisingly to anyone who has ever had to pick up an overheating laptop off their lap after watching their favorite YouTube HD video, cooling large data centers can account for up to half of the total data center power, although this is an area where the industry has recently started to put a large amount of focus on.

In years past, a common tactic of fringe computer games has been to “overclock” the CPU and GPU chips, running the chips faster, while consuming more energy, to get the last possible % of performance out of the chips. One need read no farther than the latest Nvidia Kepler whitepaper to see that this trend is reversing, i.e. chip designers are making tradeoffs that allow them to get more performance by running the chips slower, not faster, a not totally intuitive idea for those who don’t everyday delve into circuit design.

From the Nvidia whitepaper: ‘Running execution units at a higher clock rate allows a chip to achieve a given target throughput with fewer copies of the execution units, which is essentially an area optimization, but the clocking logic for the faster cores is more power‐hungry. For Kepler, our priority was performance per watt. While we made many optimizations that benefitted both area and power, we chose to optimize for power even at the expense of some added area cost, with a larger number of processing cores running at the lower, less power‐hungry GPU clock.’

From ARM to Kepler, or just packing a rack full of traditional x86 powered servers, the design goals of extreme energy efficiency drive a new way of thinking about computer design, as witnessed by HP’s Project Moonshot. Increasingly, be it building ExaFLOP supercomputers or ExaSCALE compute farms for Microsoft or Facebook, one needs to think beyond just the processor, memory, and disk drive to build power efficient system, one must design the server, the networking, the storage, as well as the rack and data center power and the cooling in concert together.

So no matter if you are thinking about the first ZettaFLOP system, like Professor Sterling, or the last ExaFLOP one, there are plenty of software and hardware challenges left for everyone in the computer technology field.

About these ads

About Marc Hamilton

Marc Hamilton – Vice President, Solutions Architecture and Engineering, NVIDIA. At NVIDIA, the Visual Computing Company, Marc leads the worldwide Solutions Architecture and Engineering team, responsible for working with NVIDIA’s customers and partners to deliver the world’s best end to end solutions for professional visualization and design, high performance computing, and big data analytics. Prior to NVIDIA, Marc worked in the Hyperscale Business Unit within HP’s Enterprise Group where he led the HPC team for the Americas region. Marc spent 16 years at Sun Microsystems in HPC and other sales and marketing executive management roles. Marc also worked at TRW developing HPC applications for the US aerospace and defense industry. He has published a number of technical articles and is the author of the book, “Software Development, Building Reliable Systems”. Marc holds a BS degree in Math and Computer Science from UCLA, an MS degree in Electrical Engineering from USC, and is a graduate of the UCLA Executive Management program.
This entry was posted in Cloud Computing, HPC. Bookmark the permalink.

One Response to The Last ExaFLOP Supercomputer

  1. Pingback: Marc Hamilton on the Last Exascale Supercomputer | insideHPC.com

Comments are closed.