For those of you who didn’t catch my update on Linkedin, since my last post I’ve started a new position at NVIDIA leading their worldwide solutions architecture and engineering team. I’m super excited about my new job, in part as it will no doubt lead to some great new blogs. Meanwhile, the team has wasted no time getting me on the road and I am spending the week in India to attend the HiPC conference in Bangalore as well as the Indo-US Workshop on High Performance Computing Applications and Big Data Analytics sponsored by the Indian Institute of Science and Oak Ridge National Laboratory (ORNL).
I probably don’t have to remind anyone that ORNL is home of Titan, the world’s most powerful GPU accelerated supercomputer. I was pleasantly surprised, however, to see how many workshop presentations by local Indian researchers focused on GPU related topics. All other advances aside, one of the great things about GPU computing is that you don’t need a Titan-sized budget to achieve meaningful scientific work with GPUs. Many of the results being achieved on much smaller GPU based systems today simply wouldn’t have been affordable without GPUs. And that is before paying the power costs. So not only are we seeing great research results out of ORNL, but research centers across India and in many countries with much more limited research dollars can today achieve amazing science with GPU computing.
Oil companies are sometimes in the news for how much profit they make, but lets take a look at the computing requirements of various algorithms being pursued in that space.
The current “holy grail” of seismic research, elastic imaging, requires about 120 PetaFLOPs of compute power, still far out of reach of the world’s fastest supercomputers.
Put in perspective, elastic imaging, done today on standard x86 processors without accelerators, would consume 17% of the peak output of Tehri hydroelectric power station, or about 376 megawatts.
The goal of ExaScale programs in the US and elsewhere around the world are to consume no more than 20-30 megawatts. That equates to 120 petaflop systems in the 2-3 megawatt range, well within the power range of data centers operated today by many energy companies such as the recently opened BP data center in Houston.
Reaching the 20 picojoule/FLOP efficiency required by ExaScale systems will require much more than simply improved process technology and smaller transistors. Increasingly the focus on power efficiency will have to turn to minimizing data movement. Technologies like Unified Memory, recently introduced into CUDA 6, not only make it easier to program the more energy efficient FLOPS inside the GPU, they pave the way for increased hardware support data locality in future processors.
Credits to Ty M. for the Tehri and Elastic Imaging examples.