Updates From NVIDIA Asia

I’ve been on a two week tour of Asia visiting customers, partners, and speaking at our GPU Technology Workshops in Taiwan and Singapore and the GPU Technology Conference in Tokyo. It has been great to see so many old and new customers. Some of the highlights.

At each of the shows, our Tegra-based Jetson TK1 development kit drew huge crowds. The Jetson board makes a great demo, plug in a camera and fire up the included NVIDIA Visionworks library and demo programs and you can instantly see the potential of the TK1′s 326 GFlops of compute power. In many countries NVIDIA enthusiasts seem to have snapped up all of our local distributor’s initial orders, but if you are lucky enough to have received a Jetson, a must-visit developer site is the elinux.org Jetson site. Customers have all sorts of ideas of clever things to do with Jetson, including this prototype below who’s owner allowed me to take a photograph as long as I didn’t reveal any additional details.

While not yet sporting the latest TK1 chip, we had a shiny red Audi A3 on display at the Tokyo show, complete with its NVIDIA Tegra powered infotainment system.

Sorry, the Audi-wrapped Tegra costs a bit more than the $192 (US) Jetson board.

Today we officially introduce the Asia instance of our TryGrid site. NVIDIA has been serving up 24 hour graphics accelerated GPU sessions from Cloud locations in the US and Europe for several months now, and the new Asia site brings the demonstration service to Asia users without trans-Pacific latencies. You can, however, connect to any of the TryGrid sites and compare the latencies. The live demo during my keynote today went off flawlessly, thanks Masaki! I keep having to remind people because it works so well, TryGrid is a demonstration tool only, we are not operating it as commercial service, you can’t buy TryGrid from NVIDIA, but many of our partners offer desktop as a service with GPU accelerated graphics powered by NVIDIA Grid technology.

Of course, High Performance Computing powered by NVIDIA Tegra GPUs continue to be a big focus of my work. It was great to catch up with John Taylor from Australia’s CSIRO who was a keynote speaker at the Singapore workshop. Also great to see Cristina Beldica from NCSA’s Blue Waters who highlighted the Singapore HPC session and talk about how Blue Waters’ scientists are scaling codes across 1000′s of GPUs.

Another day of customer meetings around Tokyo on Thursday and then home Friday. And to any of my co-workers from the states who may indulge in too much Japanese or other Asian food this weekend, John Taylor at CSIRO recommends this video.

Posted in Uncategorized | Leave a comment

Living On The [NVIDIA] Grid

The primary computer I use every day is a very nice current generation MacBook Pro with an NVIDIA GeForce GT 750M GPU. Nevertheless, with all that graphics horsepower on my laptop, for the last several weeks I have been spending my workday running an NVIDIA graphics-accelerated VDI session. You can experience the same for 24 hours by signing up at our TryGrid site for a free 24-hour session. So why use graphics-accelerated VDI when you have perfectly good graphics already? Lets go through some of the reasons.

Since my VDI session is professional administered in the cloud (the private cloud in NVIDIA’s data center for our corporate VDI instance), I never have to worry about saving or copying files from my laptop drive to a RAID-protected, backed up, managed, file system. Sure with my laptop I can map a network drive to make it appear like a local drive, but in my VDI session, since it lives in the data center, my files never are stored locally. Unless, of course, I go out of my way to copy them from the cloud to my local desktop. My VDI session automatically maps my laptop drive to the Windows H drive on my VDI instance, so if I need a file, for instance for later disconnected work, it is easy to move. I won’t go into all the security benefits of having my files stay in NVIDIA’s data center versus on my laptop’s hard drive, but that is another big benefit as well.

Monday mornings (or Tuesday this week because of the US Memorial Day holiday) are also a lot easier with VDI. As soon as I reconnect my VDI instance, all my browser windows, documents, and other windows are there exactly where I left them. And of course I don’t need my laptop to get there. Should I leave my laptop at home, or worse lose it or damage it, I can get access to my VDI instance from any corporate PC. Persistence of VDI sessions is a huge benefit that is often overlooked.

Another advantage of VDI is that our VDI servers are connected to the world via 10G Arista switches. We have pretty good WiFi on campus, but big files still move a lot faster coming down a 10G pipe directly into our corporate data center than out over our campus network and eventually into a relatively skinny WiFi signal or building 1G or slower ethernet. Same with files. My Mac has a fast SSD, but for huge files it is no match for our corporate NetApp file servers.

Here is how I described our graphics accelerated VDI to a family member this weekend: 1) It just runs all your favorite graphics-accelerated apps, plus all your corporate apps, with the delightful visual experience you expect from NVIDIA; 2) Like a Google Chromebook, your laptop becomes essentially stateless for anything running in your VDI session, 3) and finally, because its connected to big fat network and storage pipes in our corporate data center, any file or network intensive activity is just a lot faster.

Finally, NVIDIA brings a great solution for moving the corporate desktop into the [public or private] cloud, NVIDIA Grid

Posted in Uncategorized

NVIDIA GRID Test Drive

NVIDIA’s GRID vGPU technology brings the full benefit of NVIDIA hardware-accelerated graphics to virtualized solutions. Until this month, however, it was a bit cumbersome to experience and test given the complexity of setting up a VDI environment, even a non-production test and demo environment. Well, wait no more. You can now test-drive our GRID technology for 24 hours at no cost by visiting nvidia.com/trygrid and in less than five minutes be running your own hardware-accelerated VDI session hosted in the cloud. Here are a few more details on the test drive.

  • The GRID Test Drive is a technology demonstration of GPU accelerated VDI in a virtualization stack agnostic way. It showcases the ultimate user experience available today with NVIDIA GRID.
  • The technology is hosted on public cloud infrastructure and today is targeted to North American customers for best user experience. We are working on expanding globally including Europe and Asia. This addition will be available soon.
  • GPU acceleration uses GRID K2 GPUs
  • At the moment we offer a Windows Client. Soon a client version for Mac OS and Linux will be available as well. This is a limitation of the test drive environment only, there are Mac OS and Linux clients available today for customer deployment.
  • Applications available for Test Drive include AutoCAD, eDrawings, Teamcenter, MS Office, Digital Ira & Dawn, HTML 5, 3D PDF Viewer. In addition, users can install any software of their choice.
  • Each session is available for 24 hours. Users can log in multiple times and return to their personal 24h virtual desktop.
  • Additional FAQs available on the GRID Forum.

    If you are an enterprise customer trying to separate the hype of VDI from the reality, the GRID Test Drive is a great way to experience the performance yourself. If you are a Cloud service provider looking for new services to offer your customers, the GRID Test Drive is a great proof that the technology is cloud-ready.

    Enjoy!

  • Posted in Uncategorized

    Back to the Future with Cray

    This week I was lucky enough to spend a day and a half in Lugano, Switzerland attending the first part of the Cray User Group. It was great to meet up with old friends from Cray, their partners, and their customers. I was legitimately attending not only as a partner but as a Cray user as my team at NVIDIA operates not one but two Cray systems, a Cray XK7 and a Cray XC30, both of course with NVIDIA GPUs, as part of our customer benchmark center. Our benchmark center also has hundreds of different x86 servers arranged into multiple clusters, all available to our customers for benchmarking.

    The theme of this year’s Cray User Group, Back to the Future of Supercomputing, made me think back a decade to when I was responsible for the HPC benchmarking center at Sun Microsystems. At the time, the Sun center had some large (up to 64 socket) SPARC based systems, some small low power multicore/multithreaded SPARC Niagara (aka UltraSPARC T1) based systems, and a lot of different x86 based servers which Sun was just starting to build. It was a challenge then for our benchmarking team to find the best mix of systems to maximize performance for any given customer workload.

    While already equipped with quite an heterogenous mix of systems and processors today, the NVIDIA benchmarking center promises to get a lot more heterogeneous over the coming year. Soon we will be getting our first OpenPower based servers with GPUs and after that I expect quite a few different ARM-64 based servers supporting GPUs. I keep telling my team of solutions architects that it is great job security because as the mix of systems available on the market increases, customers have an increasing need for trusted technical advisors to help recommend the best systems.

    Of course the best system isn’t always easily defined. If you only consider two socket x86 systems with 16 DIMM slots, the best system is often defined as the lowest cost system. Of course even there you need to be careful to consider CPU speed and power, DIMM speed and power, power supply efficiency, networking capabilities, and cooling efficiency. And cooling efficiency also requires knowledge of the customer’s data center infrastructure. As increasingly does power supply efficiency. But at the same time, if you limit yourself to a two socket x86 server and 16 DIMMs, you aren’t going to get a lot of innovation in that space.

    With IBM refocusing their server and HPC efforts on OpenPower with NVIDIA GPUs, and multiple ARM-64 based processors set to enter the market in the next twelve months, along with continued innovation in x86, and every processor vendor adopting accelerators of some sort, there is going to be a lot more innovation in the server space. While this also leads to more choices for customers, it doesn’t necessarily have to be more complicated. Talking to a customer from Lugano based Swiss National Supercomputer Center, CSCS, he stated their future system goals very succinctly, “kilowatts to job completion”. CSCS knows a thing or two about supercomputing, as they run Piz Daint, the largest supercomputer in Europe, which also ranks as the greenest petaflop supercomputer in the world according to the Green500 list. A Cray XC30, Piz Daint pairs one NVIDIA GPU to each x86 processor in the system to achieve its performance and energy efficiency.

    Finally, to end my blog, a bit of a blatant recruiting pitch. As noted above, the NVIDIA customer benchmarking center is growing. That means we are looking for another HPC sys-admin. If the thought of getting to work with all the latest NVIDIA GPUs before they are announced, along with all the latest supercomputers from Cray and all our other systems partners, and getting a jump on learning new OpenPower and ARM based servers, get in touch with me. The position is based in our Santa Clara headquarters.

    Posted in Uncategorized

    Next Generation Computer Interfaces

    What we consider a “computer” will likely change even faster in the next decade than the previous, but the computers we use everyday are likely to still have some sort of display and utilize a GPU for many years to come. When Windows XP was released in 2001, it touted a more intuitive user interface and expanded multimedia capabilities among a list of benefits. And happy indeed was the new Windows XP user in 2001 lucky enough to get the latest NVIDIA GeForce 3 graphics card with their new OS. Not so happy are those still using the now unsupported Windows XP. I bet Oculus CEO Brendan Iribe isn’t using Windows XP today. While it isn’t exactly clear what Brendan or his soon to be new boss Mark Zuckerberg think the intuitive user interface of the future will be, it clearly will be more graphics rich and cloud-connected than today’s phone, tablet, laptop, or PC.

    We might hear a bit more about the Oculus-Facebook vision next week at Disrupt NY 2014 when Brendan speaks. But here are a few thoughts of my own on next generation computer interfaces.

  • Your computer interface will continue to become more cloud-connected. Already today, a large percentage of the data you interact with on your computer is coming from or going to the cloud and exists only ephemerally on your phone/tablet/laptop/PC computer. Games are streamed from the cloud, office documents live there, as well as your photos, videos, voicemail messages, and troves of other data.
  • Not only will your data live in the could, but the display image you see on your display will increasingly be rendered in the cloud. Oculus goggles offer a great virtual reality experience, but not very practical today for walking around town. Even the much smaller and much more limited features of Google Glass are too imposing for most non-techies. But the battery power of any mobile device ultimately limits its graphics performance. So moving the number crunching part of graphics back into the cloud makes a lot of sense. Google Glass relies on servers in far away Google data centers for major parts of their functionality. This is already happening today in the enterprise as well, using technologies like NVIDIA’s vGPU virtual GPU technology to deliver high-end 3D graphics to almost any computer display. VMware’s recently announced plans to support vGPU will only accelerate enterprise adoption.
  • GPUs and CPUs will continue to co-exist as graphics demands of ever more visually rich consumer devices continue to grow faster than Moore’s Law. A general purpose CPU needs to be good at doing small bits of work very quickly. CPU memory architectures are thus optimized to move relatively small amounts of data from main memory into processor cache memory, eventually be used by the processor. GPU memory architectures are optimized to move large amounts of data from main memory into the GPU, and bandwidth is more important than absolute speed. A simple comparison is race car can speed 200 MPH around a track. But eight cars moving at 50 MPH down an 8 lane highway have a combined speed of 400 MPH. Both have their uses.
  • Larger displays require more graphics processing power. On a 15″ laptop, a roughly 1000×1000 pixel display is fine. On a 50″ TV, so called 4K technology, or roughly 4000×4000 pixels, is the new high end standard. But if you want to display a 180 degree field of view, you need the equivalent of many 4K displays. You can get by with fewer pixels by moving the display closer to your eyes, as is done with with goggles or other head mounted displays, but even at a 1″ distance, the human eye can still distinguish between millions of pixels.
  • Computer displays are fairly boring if you don’t have a lot of content to display on them. Vice versa, a terabyte of data isn’t too interesting if you can’t display it, manipulate it, and interact with it. If your data sits in the cloud, then it will be a lot more efficient to generate your display in the cloud, versus copying all the data to your local computer to generate the display.

    How all this translates into future displays and interactions with the digital world remains to be played out across Silicon Valley and other high tech centers of the world. So while many question the logic of Facebook’s pending acquisition of Oculus, it makes perfect sense to me. It has very little to do with the current Oculus goggles, but it has everything to do with the future of computer interfaces and how we interact with all the world’s data.

  • Posted in Uncategorized | 1 Comment

    IBM’s Laser Focus on OpenPower

    With IBM’s pending sale of it’s x86 server business to Lenovo, their remaining server business is squarely focused on Power, and that holds great promise for innovation via the OpenPower Foundation. The concept of an “open” CPU architecture is not new. In 2006, Sun Microsystems released the complete design of its UltraSPARC T1 processor to OpenSPARC.org. Oracle’s acquisition of Sun in 2010 ensured that effort didn’t continue, although OpenSPARC is still used in a few university courses. With 26 members and counting, including the likes of Google, NVIDIA, Mellanox, and Samsung, the OpenPower Foundation appears set for greater success, as demonstrated in today’s OpenInnovation Summit.

    As part of its work with the OpenPower Foundation, NVIDIA is adding CUDA software support for NVIDIA GPUs with IBM POWER CPUs. IBM and NVIDIA are demonstrating the first GPU accelerator framework for Java, showing an order of magnitude performance improvement on Hadoop Analytics applications compared to a CPU-only implementation. NVIDIA will offer its NVLinkTM high-speed GPU interconnect as a licensed technology to OpenPOWER Foundation members.

    GPUs with NVLink are the perfect accompaniment to OpenPower. There was plenty of discussion at today’s OpenInnovation Summit about the powerful memory subsystem of Power8. Even less powerful memory subsystems available today overtake the fastest PCIe interfaces to the GPU. NVIDIA has worked closely with IBM on roadmaps and NVLink promises to keep up with Power8 and beyond memory systems, ensuring GPUs on OpenPower systems will be able to take full advantage of all the CPU memory bandwidth. In turn, this will enable OpenPower systems to realize the full potential of the Unified Memory architecture introduced in CUDA 6.

    IBM has a built-in market for future Power + GPU systems with its own software divisions. At the GPU Technology Conference earlier this year, IBM demonstrated some of its commercial software applications written in Java running on a Power + GPU test server. Java applications typically have many parallel threads that can be accelerated through the use of GPUs. And of course CUDA based code, which executes on the GPU, is basically unchanged no matter if that GPU is attached to a Power, ARM, or x86 based CPU.

    With the OpenPower foundation, IBM appears to have taken the motto of innovation happens everywhere to heart and already has a good start on driving adoption. As I mentioned in my last blog, there are already companies in China working on OpenPower designs. It is great to see innovation in the processor and server world is alive and well, promising to give IT customers more choices and greater value for well into the future.

    Posted in Cloud Computing, HPC

    China HPC Perspectives – Part 2

    Today I continue my China HPC Perspectives series by talking about what some of the Chinese IT companies are up to in the HPC space. Even before the recent announcement by Lenovo to acquire IBM’s x86 server business, local Chinese vendors were capturing an increasingly large share of the domestic server market. According to this China Economic Net article local manufacturers now account for 40% of the server market which IDC reports is growing revenue at a 30% a year.

    Lets take a look at the Chinese IT giant Huawei. While Huawei isn’t a big name in the HPC space, they have been steadily adding HPC capabilities for the new style of commercial HPC applications like machine learning that I discussed in Part 1 of this series. If you still think Huawei only makes telco-class routers, you clearly haven’t been reading The Register. Back in 2012, Huawei took a big step towards paying attention to the HPC server market with the introduction of InfinBand options for their E9000 blade servers. Then last year Huawei followed up by announcing plans to work with NVIDIA on GPU virtualization. While Huawei is expanding in many markets with HPC requirements, their original core customer base, large telcos, is also looking to HPC and big data applications running on GPUs. I expect Huawei took notice of this press release on how European telco Orange is using NVIDIA GPUs to power new big data apps. I wonder if Orange is a Huawei customer?

    As in other parts of the world, all of the Chinese server vendors I met with last week were working on ARM server designs. A number of Chinese firms are ARM licensees and I expect we will see multiple server-class ARMv8 64-bit processors come out of China in the next two years. Last month’s NVIDIA NVLink announcement is especially interesting to ARM processor and server vendors. With NVLink, server vendors can connect their own ARM processor via a high speed NVLink channel to one or more NVIDIA GPUs and offer innovative designs for specific HPC and machine learning workloads without being tied to the traditional 2-socket and PCIe bus server design. In addition to ARM, there is also Chinese interest in building OpenPOWER based servers with NVIDIA GPUs. Baidu is likely to have quite a few compelling new local sources over the next few years for GPU powered systems to run their machine learning algorithms on.

    A large part of China’s success in the IT market has been supported by an ever growing base of open and industry standards. While only a relatively small number of users have access to the 7000+ GPUs in Tianhe-1A, computer science students at virtually every Chinese university can write CUDA code on PCs and laptops with NVIDIA GeForce GPUs or by accessing GPUs in the cloud. Many of those Chinese clouds run on OpenStack. No doubt one of the reasons Huawei is a Gold Member of the OpenStack foundation, supporting the foundation at the same financial level as Cisco.

    It will be interesting to watch the global HPC market continue to evolve over the next few years. There is no shortage of demand for ever more powerful traditional scientific processing to forecast the weather more accurately or develop new life-saving medications. But increasingly, new commercial HPC workloads such as machine learning promise to require equal if not greater HPC capabilities. Of course key to supporting this increased demand is improved energy efficiency, as tracked by the Green500 list. With ten of the top 10 systems on the current Green500 being powered by NVIDIA GPUs, we know a little about energy efficiency. And of course China is not alone in seeking to increase the locally development content of new energy efficient HPC systems. Europe, Japan, Korea, and others including of course the US continue to develop their own HPC hardware and software technologies, helped along by new and open and industry standards such as ARM and OpenStack.

    As the world leader in visual computing, NVIDIA looks forward to working with every nation in the world to continue to drive forward innovation in HPC, in an open, collaborative environment. When a student in any country can learn to program CUDA, be competitively admitted to one of the county’s best universities, and get competing job offers from NVIDIA and Baidu, innovation happens. When server vendors can connect an NVIDIA GPU to their processor of choice without speed limits, innovation happens. When cloud computing brings the power of the world’s largest supercomputers to everyday users, innovation happens.

    So with my last sunset for this trip to China,

    I’m more excited than ever about the future of HPC.

    Posted in Uncategorized | 1 Comment