When you have as many as 600 different projects and 6,000+ diverse users, how do you implement a supercomputing solution to address such a broad range of applications? How do you optimize the code performance across the grand variations?
Based on challenging requirements like this, Cray developed the XCTM series supercomputers to support differing processor technologies in the same architecture. By leveraging the advantages of both multicore and many-core devices, these hybrid Cray systems can target the most appropriate technologies to get the best possible performance out of differing applications.
With this in mind, Cray has delivered its insightful software tool stack with its highest-performing XCTM supercomputer configuration to date, implementing the Intel® Xeon PhiTM processor, previously codenamed “Knights Landing,” or “KNL.” Learn more about KNL and the Cray XC supercomputer here. This new many-core device complements existing multicore Intel® Xeon® processor compute partitions and offers innovative new capabilities including very fast, on-chip high-bandwidth memory (HBM), double vector units and more threads per core. These new features will be particularly advantageous for applications with intense memory bandwidth performance demands.
Users and their data-intensive compute applications
Cray successfully executed early validation of HPC codes for the code named “Knights Landing” in 2015, and disclosed scaling applications to over 10,000 cores back at SC15, including GTC, HPGMG-FV, HPL, MILC, miniDFT, miniGHOST, OMB, SNAP and UMT.
Early customers like NERSC are very systematic about identifying codes that could benefit from the many-core architecture with HBM. With 600+ projects and 6,000+ users, NERSC utilizes a substantial number of core hours per year. However, careful analysis revealed that many of those user project runtimes make use of the same top 25 to 30 base codes, so they are evaluating multiple codes each in the areas of advanced scientific computing research, biological and environmental research, basic energy science, fusion energy sciences, high energy physics and nuclear physics to find which are the best fit for the new many-core device family.
Additional early users and pioneers
Numerous HPC industry organizations have already publicly announced their commitment to this next step in the evolution of Cray multi-petascale computing, including luminaries like Argonne National Labs, the European Centre for Medium-Range Weather Forecasts, the aforementioned National Energy Research Scientific Computing Center/Lawrence Berkeley National Labs, Los Alamos National Labs and Sandia National Labs. These leading-edge research facilities continuously pioneer the frontier of the most challenging compute projects, increasingly restricted by the growing volume of data and the system data I/O movement that is required for these extreme applications.
The Cray XC series software advantage
Cray has invested decades in a robust software tool chain development. Combined with years of “many-core” expertise, this investment returns great value for supporting the new device’s higher core counts, expanded threads per core and the wider vector unit. To optimize code execution performance, Cray provides a software stack that accelerates time to insight, easing code analysis with the CrayPAT™ and Apprentice tools, identifying bottlenecks, providing porting assistance recommendations via Reveal and delivering the auto-vectorization advantages of the proven Cray compiler. The compiler also supports optimization for parallelism via directive-based OpenMP programming (making use of about 10 times more threads) and AVX-512 instructions of double-wide vector length (increasing speed by up to eight times). Cray’s performance-enhancing software tools optimize the silicon innovations of the Intel Xeon Phi processors. Learn more specific examples from this video of Heidi Poxon, technical lead of Cray’s Software Engineering team, as she instructs senior DOE programmers in the art of improving performance through “Adding Parallelism to HPC Applications.”
Additionally, the Intel high-speed memory can be configured as a cache or as a directly addressable fast memory, and Cray has created a flexible software feature to enable a user to configure the nodes at job launch. This capability enables Cray XC supercomputers to support a spectrum of use-modes that span from new code creation to application tuning to rebuilding, and all the way to merely loading pre-existing ISV codes and executing.
Cray’s continued investment in programming environment and system management software innovation, decades of system design expertise and the integration of the latest processing technology from Intel combine to introduce our highest-performing supercomputer to date. This giant leap in Cray’s adaptive supercomputing strategy delivers a scalable production platform that supports state-of-the-art multicore and many-core processing technologies in the same architecture, better enabling users to implement the most optimized configuration to get the best performance results out of their diverse applications.