I’ve enjoyed a few days of real vacation this week and stayed mostly offline, so time to catchup with this week’s HPC Friday. My inspiration today comes from this week’s GigaOM interview with Andy Bechtolsheim on emerging hardware and software advances enabling real time data mining. I was lucky enough to work with Andy during his second stint at Sun Microsystems, and anyone who knows Andy knows that every conversation with him is an intellectual delight. If you want to hear more about Andy’s Sun days, check out this 2006 Sun Founders Panel interview from the Computer History Musuem hosted by John Gage. OK, so I did sneak online a bit during vacation.
As Andy mentions, there is a tremendous amount of work going into big data software for real-time analytics. HP is certainly investing in this area, with internal research and development, through acquisitions like Vertica , with partners like Cloudera, and with the Apache Hadoop project and other open-source projects.
Anyone who has ever purchased a book on Amazon has experienced the benefits of real-time analytics via those helpful recommendations Amazon gives you for other books you might enjoy reading. Most people don’t find Amazon’s recommendations intrusive, in part because they are custom-tailored to your shopping history as well as the history of other items you have viewed on the Amazon web site. On the other hand, everyone has received spam email, most of which is based on zero analytics, that you did find intrusive.
Of course, shopping and advertising, while perhaps the most top of mind thoughts when discussing real-time analytics, are just one of many uses of the technology. From detecting fraud in credit card transactions to monitoring traffic conditions and suggesting alternate more fuel-efficient routes there are many many uses of real-time analytics. The challenge is that as more devices connect to the web, and as those devices generate more and more data, more advanced hardware and software approaches are needed for real-time analytics. This is an area when High Performance Computing (HPC) and technologies from the web world are just starting to play together.
The use of HPC technologies to understand the human mind is not new, and in fact started long before Google-scale analytics. This 2002 paper by UCLA’s Dr. Toga Imaging Databases and Neuroscience talks about a very different way to analyze the human brain. I still clearly remember meeting with Dr. Toga and Andy Bechtolsheim in 2004 at UCLA’s Laboratory for Neural Imaging (LONI) where Dr. Toga talked about the rapidly increasing challenges of dealing with the data explosion of brain images as new generations of MRI scanners and other instruments generated exponentially more data. Over lunch with Dr. Toga in Westwood, Andy sketched out on the back of a paper napkin what would become one of Sun’s future HPC servers.
Neural imaging isn’t always the best lunchtime conversation if you have a weak stomach. As the highest power imaging devices today can be harmful to live brains, many of the highest resolution images available come from frozen human brain cryosections.
Thanks to years of work by researchers at LONI and elsewhere, entire suites of neural imaging software like BrainSuite are readily available. BrainSuite is a suite of image analysis tools designed to process magnetic resonance images (MRI) of the human head.
So what do you get when BrainSuite meets Hadoop? For one thing, you will need more powerful compute infrastructure. As Andy discussed in his GigaOM interview, most early Hadoop clusters where built out of simple commodity x86 servers, with 1GB network interfaces and local, relatively low speed, disk drives. That worked just fine because Hadoop was written for such an architecture. However today, as people try to do more and more with real-time analytics, and as the components of HPC servers like 10G ethernet, GPUs, and Flash technologies are driven by commodity technology price/performance curves, it is natural that we will see these types of HPC technologies applied to execution of Hadoop and other web software.
For systems companies like HP, this presents numerous opportunities to generate value. An x86 CPU, an ethernet chip, a flash chip, and a GPU, each considered separately, are commodity components that benefit from Moore’s law and its related price/performance curve. An advanced commercial HPC cluster, a Hadoop cluster, or a HP Performance Optimized Data Center (POD), while built out of commodity, industry standard components and thus benefiting from the underlying technology price/performance curves, are examples of significant engineering effort.
To ponder what would be possible in a few years with continued advances in neural imaging and real-time analytics, one need look no farther than the April 2011 issue of UCLA Magazine and its feature article entitled Head Games. The implications, as the article states, are mind-boggling. And as Andy stated to GigaOM, enough to keep him busy developing every more advanced computer technology until at least 2030 when he plans to retire. Andy, here’s to 2030!