If the slow page load times this morning of the Nvidia GPU Tech Conference are any indication, there is going to be a record crowd at this year’s gathering of Nvidia fans. No doubt, when Nvidia co-founder and CEO Jen-Hsun Huang takes the stage this week, he will talk about the increased performance of Nvidia’s newest GPUs, some of which HP has been busy testing in our labs over the last several months. Of course, we can’t talk about just how fast any of Nvidia’s new GPUs are until they are officially announced, but we can talk about how to get ready by optimizing systems you can purchase from HP today to get the best performance out of current and future Nvidia GPUs.
Adding one or more GPUs to some servers is like adding a turbocharged BMW racing engine into your compact car. In contrast, HP ProLiant SL250s Gen8 server, pictured below, is the latest in a family of ProLiant SL servers designed from the ground up for GPU computing. A single SL250s supports up to 3 GPUs, meaning you can pack a total of 12 GPUs into 4 rack units of space using the HP ProLiant SL6500 chassis.
When you start adding GPUs to a server you definitely need to start paying closer attention to network performance. Some of HP’s earliest and largest ProLiant SL systems like the TSUBAME2.0 system at the Tokyo Institute of Technology have used dual-rail InfiniBand. The SL250s supports not one but two full speed, 56Gb/sec FDR InfiniBand connections fully optimized to take advantage of the high speed PCIeGen3 interface on the SL250s. In addition, to maximize performance, each FDR IB connection is routed to a separate Intel Xeon Sandybridge E5-2600 series processor, helping to minimize latency and jitter which is especially important in some HPC applications. HP internal benchmarks show that dual rail IB can accelerate some applications 20-30% or more compared to single rail. While dual rail IB can increase performance even on small clusters, typically as you add systems and GPUs to a cluster, the benefit of dual rail grows.
If you are considering adding next gen PCIeGen3 GPUs to a server you should also take note of maximum PCIeGen3 signal lengths. By incorporating the GPU directly into the SL250s system, versus housing the GPUs in a separate external enclosure, HP eliminates bulky external PCIe cables and other system components. Worse yet, because of increased clock rates, PCIeGen3 signals require shorter cable lengths. If you have a non-HP server using current PCIeGen2 based GPUs in an external chassis, you may be sadly disappointed when you try to upgrade to future PCIeGen3 GPUs.
Other important factors to consider to get the most performance out of current and future GPUs is to use the fastest host memory. While GPUs come packaged with their own memory, data typically flows through system memory on its way to and from the GPU. The HP ProLiant SL250s supports the industry’s fastest 1600 MHz memory and unlike some competitors servers supports both memory DIMMs per channel running at the full 1600 MHz clock rate.
It promises to be an exciting week at the Nvidia GPU Tech conference and I’ll be sharing some of my thoughts from the show on my blog through the week.