The Power of Q

Sometimes a single letter can be very powerful, as the “Q” in the new Nvidia Tesla M2070Q option for the HP ProLiant SL390s G7 server.. The power of Q is best envisioned by walking through an example workflow.

Lets say your application requires a system with peak performance of 1000 gigaflops (a gigaflop = 1 billion floating point operations per second). This can be achieved today using eight standard two-socket x86 servers. Now lets assume that your application performs 100 FLOPs per byte of input data, is 10% efficient in FLOPS usage, and generates one byte of output data per byte of input data. That means you will generate one gigabyte (GB) of data per second. If your application runs for one hour, that means the resulting dataset will be 3.6 terabytes (TB) in size. You are likely to collect the data from those 8 servers into a shared filesystem. 1GB/sec write capability requires a high end NFS file system and would be the performance range at which you would start considering a parallel file system like HP’s Lustre file system solution.

Of course, generating the data is the easy part. Now lets say you want to view the data. A 3.6 TB dataset is likely to require a high end graphics card. High end cards like the Nvidia Quadro FX-5800 cost over $3000, not to mention a workstation like the HP Z800. The real complication, however, is that once you have waited an hour to generate your dataset, you are not going to want to wait another hour to download the dataset to your workstation. That means your file system will need to support much greater than 1GB/sec read speeds. And if you have multiple people viewing the results, multiple your file system and network bandwidth needs even more.

So lets take a look at where we are so far, you will need:

  • Eight two-socket x86 servers, minimum 10GB network bandwidth each
  • High performance file system with multiple GB/sec read and write performance (minimum 4 40GB/sec connections)
  • High speed network switch, either 40GB/sec InfiniBand or 10GB/sec ethernet, 20 available ports minimum (8 server, 8 workstation, 4 storage)
  • Eight workstations with high-end graphics cards and minimum 10GB network bandwidth

    The costs add up pretty quick. Now lets consider what your system might look like if you used the M2070Q. Lets start by configuring an HP Proliant SL390s G7 server with eight M2070Q cards. That will give you over 4000 GF of peak performance for starters. Even if you only get half the efficiency out of your GPU codes, you would still be getting double the performance of the eight two-socket x86 servers, and thus your application would finish in 30 minutes versus one hour. But that is just the start.

    With the M2070Q, you get all the power of your high-end graphics card right in your server. While you still are likely to want to save your data to a shared file system, you can use the M2070Q to start visualizing your data in situ – direct from the server after it is computed. The Tesla M2070Q delivers up to 1.3 billion triangles per second of server graphics performance, allowing you to deploy GPUs for both compute and visualization in one convenient solution. In turn, this allows you to provision significantly less storage and network bandwidth than if you had to move the data from the server to storage and then to a workstation to display. This allows you to replace a cluster of 8 servers and 8 workstations with a simple config of:

  • One Proliant SL390s G7 server with two 40GB/sec IB connections
  • Eight M2070Q GPUs
  • High performance file system with single GB/sec read and write speed (2 IB connections)
  • QDR IB switch, 4 ports available (2 for server, 2 for storage)

    The University of Hamburg was one of the first HP customers to take advantage of the power of Q in their SL390 system announced back in June. Since then, HP has sold M2070Q based systems to customers across multiple industries with demanding visualization requirements being solved with server based graphics.

  • About Marc Hamilton

    Marc Hamilton – Vice President, Solutions Architecture and Engineering, NVIDIA. At NVIDIA, the Visual Computing Company, Marc leads the worldwide Solutions Architecture and Engineering team, responsible for working with NVIDIA’s customers and partners to deliver the world’s best end to end solutions for professional visualization and design, high performance computing, and big data analytics. Prior to NVIDIA, Marc worked in the Hyperscale Business Unit within HP’s Enterprise Group where he led the HPC team for the Americas region. Marc spent 16 years at Sun Microsystems in HPC and other sales and marketing executive management roles. Marc also worked at TRW developing HPC applications for the US aerospace and defense industry. He has published a number of technical articles and is the author of the book, “Software Development, Building Reliable Systems”. Marc holds a BS degree in Math and Computer Science from UCLA, an MS degree in Electrical Engineering from USC, and is a graduate of the UCLA Executive Management program.
    This entry was posted in Uncategorized. Bookmark the permalink.

    2 Responses to The Power of Q

    1. AccelerEyes says:

      Nice post. We frequently see similar design decisions being faced by LibJacket users. We also sense the growing advantages to doing compute and visualization on the same platform, so the config you mention above is well-suited to address that growing demand. On the software side, we baked a high-performance OpenGL-based Graphics Library into LibJacket so that it’s easy to develop high-fidelity visualizations seamlessly into the GPU compute functionality of CUDA programs. More about that here:

      • Thanks AccelerEyes team for the comment. It is great to see more and more commercial ISVs supporting CUDA and GPUs and we certainly have a lot of HP customers that use AccelerEyes on their ProLiant SL390s systems.

    Comments are closed.