Sometimes a single letter can be very powerful, as the “Q” in the new Nvidia Tesla M2070Q option for the HP ProLiant SL390s G7 server.. The power of Q is best envisioned by walking through an example workflow.
Lets say your application requires a system with peak performance of 1000 gigaflops (a gigaflop = 1 billion floating point operations per second). This can be achieved today using eight standard two-socket x86 servers. Now lets assume that your application performs 100 FLOPs per byte of input data, is 10% efficient in FLOPS usage, and generates one byte of output data per byte of input data. That means you will generate one gigabyte (GB) of data per second. If your application runs for one hour, that means the resulting dataset will be 3.6 terabytes (TB) in size. You are likely to collect the data from those 8 servers into a shared filesystem. 1GB/sec write capability requires a high end NFS file system and would be the performance range at which you would start considering a parallel file system like HP’s Lustre file system solution.
Of course, generating the data is the easy part. Now lets say you want to view the data. A 3.6 TB dataset is likely to require a high end graphics card. High end cards like the Nvidia Quadro FX-5800 cost over $3000, not to mention a workstation like the HP Z800. The real complication, however, is that once you have waited an hour to generate your dataset, you are not going to want to wait another hour to download the dataset to your workstation. That means your file system will need to support much greater than 1GB/sec read speeds. And if you have multiple people viewing the results, multiple your file system and network bandwidth needs even more.
So lets take a look at where we are so far, you will need:
The costs add up pretty quick. Now lets consider what your system might look like if you used the M2070Q. Lets start by configuring an HP Proliant SL390s G7 server with eight M2070Q cards. That will give you over 4000 GF of peak performance for starters. Even if you only get half the efficiency out of your GPU codes, you would still be getting double the performance of the eight two-socket x86 servers, and thus your application would finish in 30 minutes versus one hour. But that is just the start.
With the M2070Q, you get all the power of your high-end graphics card right in your server. While you still are likely to want to save your data to a shared file system, you can use the M2070Q to start visualizing your data in situ – direct from the server after it is computed. The Tesla M2070Q delivers up to 1.3 billion triangles per second of server graphics performance, allowing you to deploy GPUs for both compute and visualization in one convenient solution. In turn, this allows you to provision significantly less storage and network bandwidth than if you had to move the data from the server to storage and then to a workstation to display. This allows you to replace a cluster of 8 servers and 8 workstations with a simple config of:
The University of Hamburg was one of the first HP customers to take advantage of the power of Q in their SL390 system announced back in June. Since then, HP has sold M2070Q based systems to customers across multiple industries with demanding visualization requirements being solved with server based graphics.