I’m off to San Antonio this weekend for the start of the Society of Exploration Geophysicists (SEG) meeting. Geophysicists in the oil and gas community were among the early adopters of GPU computing to perform the massive amounts of data processing necessary in modern day oil exploration. It should come as no surprise that the HP ProLiant SL390s with Nvidia GPUs is used by many energy companies. Commonly used algorithms for reverse time migration (RTM) are particularly well suited to GPU implementations.
With other vendors, the most common server configuration for GPUs is to use 2 GPUs such as the Nvidia M2090 with a 2 CPU x86 server. This is due to the GPU’s need for a x16 PCI connection and the total number of PCI lanes available on servers with a single PCI IO Hub (IOH). By working with early adopters of GPU computing several years ago, HP realized that once codes were ported to GPUs, customer would care less about the CPU component of the server and would want higher GPU to CPU ratios. HP thus enabled the SL390s with support for up to 3 GPUs in the SL390s 2U and up to 8 GPUs in the SL390s 4U.
Still, the question often asked by customers is if they should buy a single 8-GPU server instead of two 4-GPU servers or four 2-GPU servers. Clearly there are cost savings in buying only 1/2 or 1/4 the number of servers (even if in this case the GPU cost component is the same) and there are also management savings. But how about power savings and overall efficiency?
We recently ran a test for a large oil & gas customer comparing the SL390s 2U with 2 GPUs to the SL390s 4U configured with both 4 and 8 GPUs. The particular test run performed forward wave modeling, a major component of RTM workloads. Using the 2 GPU server as a baseline, we defined a performance/watt efficiency factor of 1.0. We then ran the same application on a 4 GPU and an 8 GPU server, with the goal of reaching the same efficiency level, i.e. showing that there were no performance bottlenecks in the 4 and 8 GPU configs that would lead to lower efficiency. The results actually showed better than expected efficiency and performance. The 4 and 8 GPU servers achieved efficiencies of 1.23 and 1.45 respectfully – significantly better than the 2 GPU baseline.
While your results may vary, the results above should not be surprising. Using Nvidia’s CUDA parallel computing architecture the GPUs are doing most of the work while the CPUs are used mainly for housekeeping, including getting data to and from the GPUs. Two CPUs are more than adequate to drive eight GPUs in the SL390s 4U. By contrast, running the same application on four servers with two CPUs and two GPUs each (total of 8 CPUs) means you are roughly wasting the power of the additional six CPUs, plus related server infrastructure such as fans, memory, network cards, disk drives, and other components.
Since no other major vendor besides HP ships a server with eight integrated GPUs, it should be an interesting few days at SEG.