HPC Cluster Software Stack
Achieving exascale performance will require not only hardware innovations but validated and tested software tools. In answer to this challenge, Cray delivers customer-centric solutions based on industry-standard hardware with a validated and proven HPC software stack, providing customers with leading application performance and system availability. The operating system and application software layer offer support for x86 processors – Intel® and AMD – or a combination of hybrid processing with NVIDIA® Tesla® GPU accelerators and Intel® Xeon Phi™ coprocessors.
The HPC cluster software stack for Cray CS300 cluster supercomputers consists of:
- Operating systems: Linux – Red Hat Enterprise Linux, CentOS and SUSE
- Middleware applications and management
- HPC programming tools
The operating system is the first level of the software stack. The HPC cluster software stack supports popular versions of Linux including Red Hat, CentOS, and SUSE. Since they are all derived from a common root, the main differences between the versions are installation programs and support.
The Cray CS300 cluster supercomputer supports Red Hat Enterprise Linux, SUSE and CentOS operating systems. Cray Advanced Cluster Engine™ (ACE) management software is delivered with Red Hat Enterprise Linux installed on the management nodes. Red Hat was selected because of the high level of support available. Cray delivers preconfigured Red Hat, SUSE or CentOS compute node images. Red Hat and SUSE are priced on a per node basis. The CentOS operating system is available as open source software that can be used at no cost.
Middleware Applications and Management
The HPC cluster software stack features Cray's Advanced Cluster Engine software at the core for providing node OS provisioning with standard operating systems including Red Hat Enterprise Linux and its derivatives such as CentOS and Scientific Linux as well as SUSE Linux. ACE has many additional features and functions including system level health monitoring and management, power monitoring, and network management including InfiniBand fabrics.
Cray builds on its stack a strong base of industry-leading software including schedulers such as Grid Engine Enterprise Edition resource manager and scheduler or SLURM (Simple Linux Utility for Resource Management). It is also compatible with other resource workload management and job scheduling software such as Altair’s PBS Professional, IBM Platform’s LSF and Torque/Maui.
Cray's HPC cluster software stack offers support for common file system options such as the Network File System (NFS), Panasas PanFS™ and Lustre. By providing multiple paths to the physical storage, Lustre enables the use of common storage technologies along with high-speed interconnects to scale well as an organization's storage needs grow.
The stack is completed with application libraries and message passing libraries such as MVAPICH, OpenMPI and Intel's MPI. A core part of the stack is Intel's latest development package, Cluster Studio XE, which combines best of breed software products for MPI, C and Fortran compilers with support for Intel® Xeon Phi™ coprocessors, debuggers, and performance analyzers.
HPC programming tools are the third level of the software stack. They support application generation, execution and debugging. Cray's ACE management software supports a complete set of HPC programming tools including compilers, libraries, and special software used to develop and test application software. Library codes perform processes common to many applications, such as math functions, and optimize them to make efficient use of processor capabilities. Libraries can be included wherever they're required, simplifying user programs sand reducing programming time. The HPC cluster software stack can also include tools for debugging, performance testing and monitoring.