China HPC Perspectives – Part 2

Today I continue my China HPC Perspectives series by talking about what some of the Chinese IT companies are up to in the HPC space. Even before the recent announcement by Lenovo to acquire IBM’s x86 server business, local Chinese vendors were capturing an increasingly large share of the domestic server market. According to this China Economic Net article local manufacturers now account for 40% of the server market which IDC reports is growing revenue at a 30% a year.

Lets take a look at the Chinese IT giant Huawei. While Huawei isn’t a big name in the HPC space, they have been steadily adding HPC capabilities for the new style of commercial HPC applications like machine learning that I discussed in Part 1 of this series. If you still think Huawei only makes telco-class routers, you clearly haven’t been reading The Register. Back in 2012, Huawei took a big step towards paying attention to the HPC server market with the introduction of InfinBand options for their E9000 blade servers. Then last year Huawei followed up by announcing plans to work with NVIDIA on GPU virtualization. While Huawei is expanding in many markets with HPC requirements, their original core customer base, large telcos, is also looking to HPC and big data applications running on GPUs. I expect Huawei took notice of this press release on how European telco Orange is using NVIDIA GPUs to power new big data apps. I wonder if Orange is a Huawei customer?

As in other parts of the world, all of the Chinese server vendors I met with last week were working on ARM server designs. A number of Chinese firms are ARM licensees and I expect we will see multiple server-class ARMv8 64-bit processors come out of China in the next two years. Last month’s NVIDIA NVLink announcement is especially interesting to ARM processor and server vendors. With NVLink, server vendors can connect their own ARM processor via a high speed NVLink channel to one or more NVIDIA GPUs and offer innovative designs for specific HPC and machine learning workloads without being tied to the traditional 2-socket and PCIe bus server design. In addition to ARM, there is also Chinese interest in building OpenPOWER based servers with NVIDIA GPUs. Baidu is likely to have quite a few compelling new local sources over the next few years for GPU powered systems to run their machine learning algorithms on.

A large part of China’s success in the IT market has been supported by an ever growing base of open and industry standards. While only a relatively small number of users have access to the 7000+ GPUs in Tianhe-1A, computer science students at virtually every Chinese university can write CUDA code on PCs and laptops with NVIDIA GeForce GPUs or by accessing GPUs in the cloud. Many of those Chinese clouds run on OpenStack. No doubt one of the reasons Huawei is a Gold Member of the OpenStack foundation, supporting the foundation at the same financial level as Cisco.

It will be interesting to watch the global HPC market continue to evolve over the next few years. There is no shortage of demand for ever more powerful traditional scientific processing to forecast the weather more accurately or develop new life-saving medications. But increasingly, new commercial HPC workloads such as machine learning promise to require equal if not greater HPC capabilities. Of course key to supporting this increased demand is improved energy efficiency, as tracked by the Green500 list. With ten of the top 10 systems on the current Green500 being powered by NVIDIA GPUs, we know a little about energy efficiency. And of course China is not alone in seeking to increase the locally development content of new energy efficient HPC systems. Europe, Japan, Korea, and others including of course the US continue to develop their own HPC hardware and software technologies, helped along by new and open and industry standards such as ARM and OpenStack.

As the world leader in visual computing, NVIDIA looks forward to working with every nation in the world to continue to drive forward innovation in HPC, in an open, collaborative environment. When a student in any country can learn to program CUDA, be competitively admitted to one of the county’s best universities, and get competing job offers from NVIDIA and Baidu, innovation happens. When server vendors can connect an NVIDIA GPU to their processor of choice without speed limits, innovation happens. When cloud computing brings the power of the world’s largest supercomputers to everyday users, innovation happens.

So with my last sunset for this trip to China,

I’m more excited than ever about the future of HPC.

Posted in Uncategorized | Leave a comment

China HPC Perspectives – Part 1

The last time I visited China several years ago, most of the HPC work was centered in the realm of scientific data processing. In November 2010, the Tianhe-1A system, with 7168 NVIDIA GPUs, had just been named the fastest supercomputer in the world on the Top500 list and outside of a few university research projects, convolutional neural networks were virtually unknown. What a difference a few years makes. While systems like Tianhe-1A are still being used extensively for scientific data processing, much of the growth of China’s HPC industry is centered around new commercial uses of HPC and NVIDIA GPUs, especially in the fast growing machine learning space.

Today, companies across China like Internet giant Baidu, are using NVIDIA GPUs and high performance computing to drive their deep learning projects as discussed last month at the GPU Technology Conference by Ren Wu, a distinguished scientist at Baidu in his GTC talk S4651 – Deep Learning Meets Heterogeneous Computing. Any software developer in the world can study up on machine learning using online courses like Stanford professor’s Andrew Ng online Coursera course. Across China, Internet companies seem to be just as active at hiring machine learning experts as US giants Facebook and Google.

Speaking with the Chinese leader of NVIDIA’s acclaimed “DevTech” team of ninja CUDA programmers reinforces the message. With almost rock star like status at his alma mater Tsinghua University, Julien has no problem getting master’s degree and PhD graduates to apply for new DevTech positions. But Julien admitted to me that of the last half dozen job offers he made to CUDA experts, over half ended up taking positions instead at Chinese Internet companies working on machine learning. I did remind him this was still a win-win for NVIDIA. So if you are top CUDA programmer in China and interested in working for NVIDIA, let me know and I will connect you with Julien. But only CUDA experts need apply, you might very well end up helping NVIDIA work with Baidu or another Chinese Internet giant, so you need to be best of the best.

Like the US, Japan, and Europe, China still has plans to build giant HPC systems like Tianhe. However, increasingly these systems are being looked at to support commercial HPC workloads like machine vision in a cloud environment in addition to just scientific data processing. As one large HPC customer told me, “there are a lot of processors that we could use for future scientific data processing, but NVIDIA is unique in being able to address our entire spectrum of commercial HPC and big data workloads”. After all, what other processor is equally good at computing discrete spatial derivatives in 3D by doing 1D convolutions for an oil company’s seismic processing reverse time migration algorithm in the morning and then running a convolution to simulate depth of field in a video game streamed from the same supercomputer in the evening. Not to mention running some convolution neural networks in between.

One evening out with the local NVIDIA solution architecture team, the entire dinner conversation was captivated by speculation on how Baidu might be improving their visual search application. Complete with many examples of the Baidu app translating for me pictures of delicious food of which I previously had no idea what I was eating. I love Chinese food although sometimes it might be best not to know what I was eating. Not the case however with these Chinese river shells, both beautiful and delicious.

Later this week, in part 2 of my China HPC series, I’ll talk about how some of the Chinese hardware companies are doing to address the growing HPC market.

Posted in Cloud Computing, HPC | Leave a comment

China’s Retail Running Shoe Market

Despite my occasional posts on running and running shoes, I didn’t know much about China’s retail running shoe market until finding myself in Shenzhen last week only to discover I had left my running shoes at home.

Next week I’ll post a few more substantial blogs on the HPC market in China, but for weekend reading I thought I would share my tips should you find yourself in a similar situation. Unfortunately I wasn’t staying in a Westin hotel. The Westin Workout program that loans hotel guests a set of New Balance workout cloths and running shoes would have sure come in handy. Luckily, one of our local solution architects, Jack L., read my bad packing tweet and offered to come take me shopping.

There was a new Western-style shopping mall within a few minutes taxi ride of my hotel, filled with brand-name stores, supposedly including a New Balance store. In fact, one of the concierges at the mall entrance waved us down one of the many halls and indicated the way. Alas, we passed several brand name shoe stores but no New Balance. We even stopped to look at one of those little mall directory signs and no New Balance store to be found, in any language. So back to the first brand name store by the mall entrance. Now this particular brand is no new-comer to running shoes, but most of the shoes in the store were decidedly not for athletes (not that I consider myself much of an athlete) or runners but more for the fashion conscious. I did find a pair of what might pass like half-decent running shoes, at nearly a 100% mark-up to US retail prices. Pass. While I was looking, Jack talked to one of the store employees who directed us to another brand name shoe store on the second floor of the mall.

Alas, not much better luck here. Doesn’t anyone in China buy running shoes in a mall? Running barefoot for a week crossed my mind, but better judgement prevailed and I picked out a pair of overpriced shoes that perhaps might just get me through the week without doing major damage to my knees and proceeded to the cashier. Next challenge. Visa not accepted. Just to be fair, they didn’t take MasterCard or American Express either. But with some translation from Jack they offered to escort me to the other end of the mall to a cashier station where I could pay with Visa. So off we went.

As we walked around the corner, what did I see:

Apparently the directory signs in this particular mall only show you the stores on the current floor. So our stop at the first floor directory had shown no sign of the New Balance store. I let Jack break the news to the clerk from the other over-priced brand name store that I wasn’t going to buy their shoes. At this point to my surprise, not only did the store carry my favorite NB 890v4, they had them in my size!

And while readers of my blog know I’m biased to New Balance, kudos to them for their global pricing strategy. The list price, converted to dollars, was almost exactly the same as the list price at my usual source, RoadRunner Sports. Not getting the 10% RoadRunner VIP program discount was a fair penalty to pay for my bad packing.

Next week I’ll be back with some more substantial posts sharing my perspectives on the HPC market in China. You won’t want to miss it.

Posted in Running | 1 Comment

NVLink To Drive Performance Innovations

One of this week’s big GPU Technology Conference announcements was NVLink. If you somehow missed the big announcements, NVIDIA’s Ian Buck does a great job explaining the highlights to insideHPC in the short video below.

Thinking back on my days working for server companies, NVLink is going to open up tremendous opportunities for innovation in the server space. Some of the basic NVLink configurations possible are illustrated below.

The large majority of data center servers today, when you lift up the covers, share the same basic design built around two CPUs. Long before server vendors added a GPU to any server, the PCI bus was used for all sorts of different add-on cards, including network adapters, storage controllers, and hundreds of different types of relatively low speed interfaces. A recent search for pci card on Amazon returned over 39,000 results. But I doubt the original creators of PCI ever envisioned connecting something as powerful as a modern GPU via PCI. Server vendors, processor vendors, and GPU vendors have gone to great lengths to continue to increase PCI performance, but with the requisite backward compatibility required, PCI simply has not kept up.

On Tuesday, GTC session 4145 by Chevron’s Thor Johnsen spake about how they are using servers with 16 Kepler GPUs for high frequency elastic seismic modeling. That application clearly has very different requirements than one running on a server with just one or two GPUs. NVLink frees the server vendor to design servers which much more closely match GPU performance for a specific market to CPU performance. I expect rather than start with two CPUs and then add GPUs as needed, we will see server vendors drive performance innovation by designing in exactly as many GPUs and CPUs into a server as needed for different classes of applications.

NVLink is also expected to drive performance innovations across an ever broadening ecosystem of ARM and OpenPower processor vendors. Design cycles for modern processors can take several years or more, much longer than the design cycle for the servers that ultimately will use that processor. Avoiding the PCI bottleneck by combining a GPU and a CPU into a single processor chip has the disadvantage of fixing the CPU to GPU ratio at design time. By allowing a broad and flexible range of CPU to GPU ratios, NVLink allows many more possible performance innovations than a solution based on fixed CPU to GPU ratios.

Combined with the new 3D memory announced for our next generation Pascal GPU, along with a strong roadmap of new CUDA features, NVLink promises to drive performance innovations in a new generation of servers to address the insatiable computing demands of HPC, big data, and machine learning problems. Innovation is certainly alive and well at the GPU Technology Conference this week.

Posted in Uncategorized | Leave a comment

Trends To Look For At GTC

NVIDIA’s annual developer conference, the GPU Technology Conference kicks off next week in San Jose and even if you can’t be there in person you can watch the keynotes and other parts of the conference online. Many attendees will be repeat visitors to the conference and have a good idea of what to expect. First and foremost, this is our developer conference, so expect it to be jam packed with lots of technical sessions. With over 400 sessions, you probably will want to use the online session filter tool to search for the sessions you want to attend rather than reading through all 400+ session descriptions. Besides of course the keynotes, some of the sessions I’m most looking forward to our those presented by our customers. While we have some great NVIDIA speakers, many of the most highly rated repeat speakers come from our customer community. Just coming back from three weeks of travel meeting with many of our customers, here are some trends I expect will be highlighted during the conference.

Over the last several quarters we have had hundreds of customer trials of our virtual GPU (vGPU) technology. vGPU has nothing to do with virtual currencies (which I haven’t spotted any sessions on although I definitely expect to hear some buzz about during the show) but is our technology for bringing the full benefit of NVIDIA hardware-accelerated graphics to virtualized desktop solutions. Our vGPU technology brings GPUs and users desktops straight into the data center, be it an enterprise data center or a public cloud data center, bringing all of the advantages of virtualization that enterprises and public clouds have learned to love. In addition to hearing traditional commercial virtualization vendors like Citrix and VMWare talk about our GPU solutions, I expect quite a bit of discussion about using GPUs with open source solutions like OpenStack.

But vGPU usage is by no means the only area where NVIDIA technology usage in the data center has grown substantially since the last GTC conference a year ago. We have seen an explosion in the use of GPUs for machine learning and pattern recognition. Much of this is going on in Internet data centers. A great example is discussed in this Netflix Blog discussing their experiences using GPU Instances on Amazon Web Services to run distributed neural networks for their recommendation engine. Commercial companies aren’t missing out on this trend either. With all the international travel I have been doing, I especially enjoy using the credit card from the provider who I know is using NVIDIA GPUs to identify potentially fraudulent transactions.

NVIDIA’s new Maxwell GPU has received a lot of press over the last several weeks for the energy efficiency it brings to high performance gaming laptops. Energy efficiency is a hot topic everywhere from laptops to electric car vehicle control systems to multi-megawatt supercomputer centers and expect to hear a lot more about what NVIDIA and our customers are doing to drive energy efficiency.

And for my last trend to look for, think ARM. With many new ARM-64 processors already announced to ship this year, wherever I go in the world, everyone from small businesses to large governments ask me about using GPUs with ARM. There is so much innovation going on in this space right now, it is hard to bet against the success of ARM and the combined talents of all the companies working in the ARM ecosystem.

It promises to be an exciting week!

Posted in Uncategorized | Leave a comment

NVIDIA’s Collaborative Engineering Culture

NVIDIA has always encouraged customers to collaborate with our product engineering teams. It is part of our collaborative engineering culture which helps constantly refine and improve our products. While NVIDIA is sometimes better known to the consumer for the GPUs in our GeForce gaming cards, like the new GeForce Titan Black, NVIDIA is just as much a software company as a hardware company. NVIDIA develops and maintains millions of lines of source code focused on just one thing – visual computing – in our device drivers, in our CUDA development environment, and other visual computing software. Countless CUDA software developers around the world count on our software and collaborate with NVIDIA in various ways through our CUDAzone site. This week we launched some major improvements to CUDAzone specifically to make it easier for developers to collaborate with NVIDIA.

If you develop in CUDA and aren’t already signed up as a registered developer that is one of the first steps to collaborating with NVIDIA. The new features launched this week allow our registered developers to enjoy the benefit of a true collaborative engagement experience with NVIDIA engineering. Registered developers can directly file bugs and be issued an nvbug ID, without going through the previous manual process that required intervention by an NVIDIA employee. Registered developers also are now able to see and respond to public comments and questions from our engineering teams and can communicate directly with NVIDIA engineering – and this will be part of our standard bug processing flow moving forward.

The user interface is simple and clear – developers do not need to complete any non-relevant fields when submitting issues relating to GPU Computing – and are prompted to provide info needed by engineering. Bugs will appear in nvbugs in the module DevPgm-CUDA, and notifications are sent to our QA team to allow for efficient followup.

So no matter if your running a complicated molecular dynamics simulation on 1000′s of GPUs on Titan or working on a CUDA project for your college computer science class while checking out on latest Assassin’s Creed game on your GeForce card, you are just a few clicks away from collaborating with NVIDIA engineering to help constantly improve our visual computing products.

Posted in Uncategorized

One Month Countdown to GTC

Exactly one month to go until NVIDIA’s GPU Technology Conference kicks off at the San Jose Convention Center on March 24th. As I walked through San Jose airport late last night, the place was already filled with banners advertising the event. Billed as “where the brightest minds come together and explore how GPUs are helping solve some of the world’s most complex challenges”, the GPU Technology Conference promises to remain true to its core as a GPU developer conference. Even with record attendance expected this year, GTC remains a highly technical event, if you are looking for a Las Vegas style junket there are plenty of other shows you should attend instead.

With over 500 deep-dive technical sessions, there are plenty of talks, tutorials, and hands-on labs to attend. Some of the sessions even have catchy names, like “S4460 – Peer-to-Peer Molecular Dynamics and You”, by Scott LeGrand, Principal Engineer, Amazon Web Services. No, that isn’t a new AWS dating service, Scott will actually talk about how he optimized the AMBER Molecular Dynamics code using peer-to-peer copies and RDMA with MVAPICH2 and OpenMPI. Scott is a return speaker and a highly-rated one at that. Even if you have zero interest in molecular dynamics his talk will be worth attending simply for the information on peer-to-peer and RDMA.

Perusing the GTC Session Listing, there is a marked uptick this year in the sessions on machine learning. S4753 – Visual Object Recognition Using Deep Convolutional Neural Networks by Rob Fergus, Associate Professor at NYU and Research Scientist at Facebook sounds interesting based on the speaker’s associations even before one reads the abstract. Although I give Scott and S4460 the upper hand in the “catchy title” category. There is probably a character limit on title length otherwise I would recommended “Not Your Parent’s Machine Vision – Visual Object Recognition Using Deep Convolution Neural Networks”.

There is a very handy filtering tool on the GTC Session Listing to help you find the sessions best for you. Personally I’m sticking primarily to the “Advanced” sessions. S4641 – Lattice QCD Using MILC and QUDA: Accelerating Calculations at the High-Energy Frontier” sounds like a great session, especially since speaker Justin Foley will talk about leadership-class facilities such as Blue Waters and Titan. S4145 – High Frequency Elastic Seismic Modeling on GPUs Without Domain Decomposition” by Thor Johnsen of Chevron also sounds interesting, where he will talk about taking advantage of “16 Kepler GPUs” [in a single server].

Of course, expect some fun and games too. You can get a firsthand look at the latest NVIDIA GTX 750, GTX 750 Ti, and GTX Titan Black graphics cards announced last week. But if you want to leave San Jose with a GTX Titan Black you might not be so lucky unless you order today. A quick check on Amazon seems to indicate “usually ships within 2 to 4 weeks” for most of the available cards. There are a few Titan Black’s listed on eBay at about 50-70% premium, that’s more markup than a U2 concern in Dublin. Wednesday night’s GTC Party is likely to also score high on the fun quotient, but marketing won’t even tell me what’s in store for that event, so I’ll have to wait patiently like the rest of you.

If you are one of the lucky ones who managed to get his/her hands on a GTX Titan Black already, and you have not yet signed up for GTC, send me a short review that I can post on my blog and I’ll send a 50% GTC discount code your way.

Hope to see many of you at GTC!

Posted in Uncategorized