The Software Imperative in High Performance Computing

Software?  Hardware and networking seem to get all the headline attention when it comes to high performance computing.  Systems get described based on the number of cabinets in a system, compute blades in a cabinet, processors on a blade, cores in a processor, clock frequency, nanometer silicon fabrication technology, and more. The supercomputing industry is infatuated with hardware spec speeds and feeds.  While hardware features are what often generates the most buzz, high performance computing software is equally important.

Whether your domain is earth sciences, military/defense, energy, life sciences, manufacturing or other, today’s supercomputing requirements are increasingly more demanding. Differentiated software is helping close those technology gaps in supercomputing that can’t be met by hardware enhancements alone.

Jay Gould, Sr. Product Marketing Manager of HPC solutions at Cray, has considerable experience working in software domains across numerous high tech segments and industries. He recently related how high performance computing software has become an imperative to producing efficient, reliable, high performance supercomputers.

How critical is software to a supercomputer?

You can have the largest supercomputer in the world, but you need powerful well-designed system software in order to make it work.  While “software” is a general term that can mean so many different things, in the supercomputing arena, it’s vitally important that we be aligned with users on software utilities like compilers, debuggers, visualization and performance analysis tools.

Users can write their own scientific/engineering codes, programs or algorithms in a number of different software languages. Independent third party vendors offer domain-specific applications to be used for unique technology challenges. These HPC systems and subsystems need to run intelligent operating systems and software libraries. Supercomputing environments need job management and resource loading tools.  All of these different “software” areas are critical and need to be fully addressed in high performance computing for users to be successful.

What does “HPC-optimized software” mean?

You can’t just clock processors faster or add more cores and expect a supercomputer to go faster. What if the code is not performance optimized? What if the system SW is not built to recover from component or network failures? What if your off-the-shelf OS does not scale? Software in general, the programming environment, the operating system, the support tools and the actual code running the application are as critical to the success of a system just as the hardware and networking that they run on.

HPC vendors like Cray will always partner with the leading silicon vendors to leverage their expertise and advances in processors, co-processors and accelerators, not to mention network technology. Now, however, to offer HPC systems that deliver some of the highest levels of sustained performance and scalability, innovative vendors have to invest in strategic initiatives to optimize programming environments, operating systems and support tools specifically for HPC applications. Commercially available off-the-shelf products can be HPC-unaware and inappropriate for demanding supercomputing applications.

Invest in software? How do you measure that?

There has been a shift trend over the last 10-20 years in many high tech fields, and this is certainly true also for the HPC industry. It used to be that hardware engineers wrote software when they had to. In fact, the joke was that the first software programming tool was actually a soldering iron.  Engineering roles got more specialized and hardware abstraction development meant that software experts could code generically for evolving hardware platforms without being HW experts themselves. At that time you might have a 1:1 ratio of HW to SW engineers.

In this day and age, Cray has invested to the point that they employ many more software engineers than hardware engineers to support these different “HPC-optimization” areas. It might not be intuitively obvious that a supercomputer company would have evolved to a 4:1 or 5:1 ratio of software to hardware engineers.

Why a “software imperative”? Can you provide some specific examples?

It is imperative that HPC vendors deliver productivity tools to make users more efficient and to improve their time-to-scientific discovery, or time-to-solution.  It is imperative to design for sustained performance and scalability for software applications.  It is imperative to design resilient high performance computing software systems that can maintain a large, important job run to completion by recovering from individual component failures or re-routing around disabled compute nodes or network connections.

Cray has optimized the royalty-free Linux operating system for scalable high performance execution on our computing network, which you can’t get with a regular open source distribution. Cray created the “supercomputer” technology category and has decades of experience in optimizing SW compilers for high performance parallel execution. We have developed sophisticated methods to run off-the-shelf applications from ISVs (Cluster Compatibility Mode – CCM) on the same HPC systems which also can streamline the execution of home grown user codes and programs (Extreme Scalability Mode – ESM).

We want our users to be productive and get timely ROI on their supercomputing investments. We find it imperative that Cray systems provide stability and resiliency as well as our unique and reputed scalable performance. Thus, Cray will continue to invest heavily in innovative high-performance computing software differentiation.

Speak Your Mind

Your email address will not be published. Required fields are marked *