Nvidia’s latest Tesla M2090 GPU shows once again that Moore’s law is alive and well and helping to drive GPU performance as well as CPUs. With 512 CUDA parallel processing cores, Nvidia announced multiple application performance records with their new M2090. While of course just about any system vendor will sell you a GPU today, HP leads the market with over 10 years of co-design experience with Nvidia, and is among the first to market with Nvidia’s new M2090 offering it today in servers like the HP ProLiant SL390s G7 4U supporting up to 8 M2090 GPUs for an industry leading GPU to CPU ratio.
Nvidia’s M2090 joins the M2070 and M2050 Tesla family of GPUs designed for server based applications. There are a couple of common mistakes I see customers make when they first start down the path of GPU computing. The first is to try to use a GPU designed for graphics processing, like Nvidia’s Quadro line, for server based computing. While sharing many of the same processing elements as Tesla GPUs, Nvidia’s Quadro GPUs are optimized for graphics processing and not server number crunching. If one pixel in your display lights up with the wrong color for 1/60th of a second every day due to an ECC error on a Quadro GPU, you are not even likely to notice. Get one multiplication wrong in an HPC job that runs for a day and you’ve wasted 24 hours of computation. Nvidia’s M2090 not only provides single and double bit ECC detection in DRAM, but extends the ECC protection to register files and the GPU’s L1 and L2 caches.
Speaking of GPU memory, the M2090 shares the same 6GB GPU memory size as the M2070. This is an important distinction from Nvidia’s M2050 GPU which only comes with 3GB of GPU memory. GPUs provide the best performance when the problem working set fits in GPU memory, hence the M2070 and M2090 commonly work better with problems up to twice the size of what fits in the M2050. Surprisingly, I’ve seen three major multi-$M RFPs this month alone that specified the M2050 GPU. The 6GB standard now available in the M2070 and M2090 should be considered the minimum GPU memory size for new systems unless you really are 100% sure you will never need to run a GPU application that requires more than 3 GB over the lifetime of your system.
Finally, the 3rd common mistake I see customers make is simply trying to add GPUs to conventional x86 servers. GPUs can deliver 10x or more the performance of a traditional x86 GPU, and you wouldn’t buy an engine with 10x the performance for an economy car, so why would you do that with your server? HP’s SL390s was designed from the ground up for GPU computing, with features like twice the PCI capability (for connecting GPUs to the system) and built-in 10G/IB networking (to move data to and from the network to the GPU).
So in summary, my advice for anyone considering GPU computing is: