Moore’s Law For GPUs

Nvidia’s latest Tesla M2090 GPU shows once again that Moore’s law is alive and well and helping to drive GPU performance as well as CPUs. With 512 CUDA parallel processing cores, Nvidia announced multiple application performance records with their new M2090. While of course just about any system vendor will sell you a GPU today, HP leads the market with over 10 years of co-design experience with Nvidia, and is among the first to market with Nvidia’s new M2090 offering it today in servers like the HP ProLiant SL390s G7 4U supporting up to 8 M2090 GPUs for an industry leading GPU to CPU ratio.

Nvidia’s M2090 joins the M2070 and M2050 Tesla family of GPUs designed for server based applications. There are a couple of common mistakes I see customers make when they first start down the path of GPU computing. The first is to try to use a GPU designed for graphics processing, like Nvidia’s Quadro line, for server based computing. While sharing many of the same processing elements as Tesla GPUs, Nvidia’s Quadro GPUs are optimized for graphics processing and not server number crunching. If one pixel in your display lights up with the wrong color for 1/60th of a second every day due to an ECC error on a Quadro GPU, you are not even likely to notice. Get one multiplication wrong in an HPC job that runs for a day and you’ve wasted 24 hours of computation. Nvidia’s M2090 not only provides single and double bit ECC detection in DRAM, but extends the ECC protection to register files and the GPU’s L1 and L2 caches.

Speaking of GPU memory, the M2090 shares the same 6GB GPU memory size as the M2070. This is an important distinction from Nvidia’s M2050 GPU which only comes with 3GB of GPU memory. GPUs provide the best performance when the problem working set fits in GPU memory, hence the M2070 and M2090 commonly work better with problems up to twice the size of what fits in the M2050. Surprisingly, I’ve seen three major multi-$M RFPs this month alone that specified the M2050 GPU. The 6GB standard now available in the M2070 and M2090 should be considered the minimum GPU memory size for new systems unless you really are 100% sure you will never need to run a GPU application that requires more than 3 GB over the lifetime of your system.

Finally, the 3rd common mistake I see customers make is simply trying to add GPUs to conventional x86 servers. GPUs can deliver 10x or more the performance of a traditional x86 GPU, and you wouldn’t buy an engine with 10x the performance for an economy car, so why would you do that with your server? HP’s SL390s was designed from the ground up for GPU computing, with features like twice the PCI capability (for connecting GPUs to the system) and built-in 10G/IB networking (to move data to and from the network to the GPU).

So in summary, my advice for anyone considering GPU computing is:

  • Be sure to start with a GPU designed for server-based computing, like Nvidia’s Tesla line
  • Don’t get locked into an outdated memory footprint, for Nvidia’s Tesla line skip the M2050 and opt for a 6 GB M2070 or M2090
  • Don’t just soup up your engine, pay attention to balanced system design and make sure your server has enough PCI bandwidth not only for the GPUs but for other I/O intensive devices like high speed network interconnects.
  • About Marc Hamilton

    Marc Hamilton – Vice President, Solutions Architecture and Engineering, NVIDIA. At NVIDIA, the Visual Computing Company, Marc leads the worldwide Solutions Architecture and Engineering team, responsible for working with NVIDIA’s customers and partners to deliver the world’s best end to end solutions for professional visualization and design, high performance computing, and big data analytics. Prior to NVIDIA, Marc worked in the Hyperscale Business Unit within HP’s Enterprise Group where he led the HPC team for the Americas region. Marc spent 16 years at Sun Microsystems in HPC and other sales and marketing executive management roles. Marc also worked at TRW developing HPC applications for the US aerospace and defense industry. He has published a number of technical articles and is the author of the book, “Software Development, Building Reliable Systems”. Marc holds a BS degree in Math and Computer Science from UCLA, an MS degree in Electrical Engineering from USC, and is a graduate of the UCLA Executive Management program.
    This entry was posted in Uncategorized. Bookmark the permalink.