The Secret Weapon of NVIDIA’s Solution Architect Team

NVIDIA’s worldwide team of solution architects work with our largest customers around the world to solve some of the toughest high performance computing, deep learning, enterprise graphics virtualization, and advanced visualization problems. Often, as in the recent US Department of Energy CORAL award, the systems our customers purchase are many times larger than anything we have on the NVIDIA campus. While no where near the size of CORAL, one of the secret weapon’s of NVIDIA’s Solution Architect team is our benchmarking and customer test cluster located behind locked doors deep within NVIDIA’s Santa Clara campus.

For what it lacks in size, the system makes up for with the latest GPUs, servers, storage, and networking gear from NVIDIA and our partners. One of our recent additions which is receiving lots of usage is a rack of Cray CS-Storm servers, fully loaded with eight K80 GPUs each. We also have a Cray XC30 system with GPUs.

We have many different types and brands of servers, not only with the latest NVIDIA GPUs but with high-end 16-core Haswell CPUs, plenty of memory (256 GB on many servers), and the latest networking technology including Mellanox 56Gb FDR InfiniBand and Arista low latency 10/40GbE switches. The systems are supported by multiple types of storage, although one of our newest additions is a large capacity Pure Systems all-flash storage array. The Pure Systems array sees dual use supporting NVIDIA Grid vGPU instances running VMware ESX and Citrix Xen hypervisors and a separate partition allocated for HPC applications.

Doug, our superstar system manager, is almost constantly adding new platforms to the system. Walking through the lab today I spotted a pile of Dell C4130 severs waiting to be mounted in racks and outfitted with four K80 GPUs each before being put to use by our solution architects to benchmark customer applications.

Of course, we also have GPU servers with Power8 and ARM-64 CPUs, so solution architects and customers can test applications in a cross platform environment. Sometimes more important than the mix of servers, however, is the full complement of NVIDIA and partner software we have installed on the systems. This ranges from the latest CUDA 7 RC to powerful NVIDIA libraries like cuDNN integrated with Caffe and ready to go for training deep learning networks. Of course for our Grid enterprise graphics virtualization business, the system supported our recent vGPU VMware early access program. Now that VMware has officially launch support for vGPU, the system is being used for the DIRECT ACCESS TO NVIDIA GRID™ vGPU™ WITH VMware Horizon® and vSphere® program.

While HP is a bit under-represented currently on the server side, we are excited to be getting in a new HP BladeSystem shortly to work with. But on solution architects’ desks, the HP Z840 is by far the favorite. Best features: support for multiple Quadro and Tesla GPUs, super-quite, and snap-in tool-less design makes swapping in new GPUs or other components a breeze. Walking between offices and the server room however, the favorite solution architect laptop these days is the new 14″ HP Chromebook. Internally we run a technology preview of the next generation of VMware Blast protocol which delivers super-fast workstation class graphics to the Tegra TK1 powered HP Chromebook. Two monitors is pretty much the minimum on any solution architect’s desk, and some have quite a few more.

The systems all live on our cloud, and besides seeing use by NVIDIA solution architects we also provide customers Cisco VPN-secured remote access, from anywhere in the world, to test our latest offerings. These days, many of the systems are busy preparing and testing demos for next month’s GPU Technology Conference. While the exact content of the demos is a secret I can’t share, lets just say we are doing a lot of deep neural network training right now on many of those K80 GPUs.

It is a great resource, and we couldn’t do our job and serve our customers without it. And a big special thanks to all of our partners who contribute to the system’s success, including Arista, Cisco, Cray, Dell, HP, Pure Storage, and Supermicro.

Advertisements

About Marc Hamilton

Marc Hamilton – Vice President, Solutions Architecture and Engineering, NVIDIA. At NVIDIA, the Visual Computing Company, Marc leads the worldwide Solutions Architecture and Engineering team, responsible for working with NVIDIA’s customers and partners to deliver the world’s best end to end solutions for professional visualization and design, high performance computing, and big data analytics. Prior to NVIDIA, Marc worked in the Hyperscale Business Unit within HP’s Enterprise Group where he led the HPC team for the Americas region. Marc spent 16 years at Sun Microsystems in HPC and other sales and marketing executive management roles. Marc also worked at TRW developing HPC applications for the US aerospace and defense industry. He has published a number of technical articles and is the author of the book, “Software Development, Building Reliable Systems”. Marc holds a BS degree in Math and Computer Science from UCLA, an MS degree in Electrical Engineering from USC, and is a graduate of the UCLA Executive Management program.
This entry was posted in Cloud Computing, HPC. Bookmark the permalink.