The Isambard Project, a GW4 Alliance initiative, recently disclosed the latest results from its benchmarking of Arm-based processors for HPC, the first such results for dual socket Cavium ThunderX2 nodes. Prof. Simon McIntosh-Smith gave a talk at the Cray User Group (CUG) Conference in Stockholm on May 23, 2018, in which he described the results of a detailed performance comparison between Cavium™ ThunderX2® Arm®-based CPUs and the latest state-of-the-art Intel Skylake x86 processors. Results focused on the HPC codes that are most heavily used on the UK’s national supercomputer, ARCHER, and showed that for these kinds of workloads, ThunderX2 is competitive with the best x86 CPUs available today, but with a significant cost advantage.
These results are important for a number of reasons. Firstly, this is the first time that Arm-based CPUs have been performance competitive for mainstream HPC workloads. For users, this means that we have a new set of processor vendors becoming relevant, giving us much more choice today than at any point in the last decade. Secondly, these performance-competitive, Arm-based processors are significantly cheaper than those shipping from the incumbent vendors, by a factor of 2-3X, depending on which SKUs you compare. This cost advantage reverses the trend of the last 5 years, which has seen price/performance increases slow to a historical minimum.
The new set of vendors developing Arm-based HPC processors are choosing different trade-offs to those in current mainstream CPUs. For example, the Cavium ThunderX2 CPUs in the Isambard Cray XC50 system have focused on delivering class-leading main memory bandwidth, rather than peak FLOP/s. This can be seen in their design, where they have devoted more of the silicon area to integrate 8 DDR4 memory channels per socket (compared to the 6 in Intel’s Skylake and 4 in Broadwell), while leaving floating-point vector width at 128 bits (compared to 512-bit in Skylake and 256-bit in Broadwell). This gives users a wider choice; for heavily vectorisable, floating-point intensive codes, the wide vectors of Skylake might be ideal. For memory-bandwidth bound codes, or codes that are more balanced between data transfer and FLOP/s, ThunderX2 might give the best solution.
But there’s another, longer-term benefit that Arm brings to HPC: the potential for real co-design. This is a term that’s often bandied around, and there’s evidence of real co-design at the system level. But at the chip level, HPC has been too small a market to warrant the traditionally large non-recurring engineering costs that it would take to customise CPUs for our applications. But the Arm ecosystem was designed exactly to make this possible. Arm has hundreds of customers, each designing bespoke processors based on Arm’s IP, and doing this much more quickly and at much lower cost than before. This raises the prospect of the “new golden age for computer architecture”, as identified by John Hennessy and David Patterson in their recent ACM/IEEE ISCA 2018 Turing Lecture. In this future, Arm-based processors could be highly customised for HPC, adding instruction set extensions such as very wide vectors, of co-processors and accelerators for AI or other applications. These processors will be highly differentiated from high-volume mainstream datacenter parts, and should bring significant steps forward in performance for scientists around the world who have become increasingly frustrated with the relatively small improvements in performance and performance per dollar we’ve seen in recent years. As such, Arm’s entry into the HPC market, and the injection of new ideas, innovation and competition this brings, could trigger a revolution in scientific computing of the kind not seen since the commodity CPU revolution of the late 1990’s. Exciting times are ahead.
For full details of the latest Isambard results, see the CUG 2018 full paper: Comparative Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard. S. McIntosh-Smith, J. Price, T. Deakin and A. Poenaru, CUG 2018, Stockholm, May 2018. CUG 2018 attendees can access the password-protected conference proceedings here.
The slides from the talk are also now available. An extended version of the paper with additional results will appear in a special issue of the Journal of Concurrency and Computation: Practise and Experience (CCPE) later this year.
Simon McIntosh-Smith is a professor of high performance computing, head of the HPC Research Group at the University of Bristol, and PI for the Isambard project. Follow @simonmcs on Twitter for more news from the HPC Research Group in Bristol.
If you’ll be attending ISC High Performance this week in Frankfurt, join Simon McIntosh-Smith, Cray’s Luiz DeRose, and other industry experts for a half-day workshop, “X86, Arm, GPUs, Oh My!” on Thursday morning. Professor McIntosh-Smith’s workshop keynote will be “Performance Portability Progress: Early Experiences Porting to Isambard, the World’s First Production Arm Supercomputer.”