Scalable Many-Core Performance
Adaptive Hybrid Computing
The Cray XK7 compute node combines AMD's 16-core Opteron™ 6200 Series processor and the NVIDIA® Tesla® K20 GPU Accelerator to create a hybrid unit with the intra-node scalability, power-efficiency of acceleration and flexibility to run applications with either scalar or accelerator components. This compute unit, combined with the Gemini interconnect's excellent inter-node scalability, creates a system geared for any computing challenge.
Gemini Scalable Interconnect
Capable of tens of millions of MPI messages per second, the Gemini ASIC complements current and future massively multi-core and many-core processors. Each hybrid compute node is interfaced to the Gemini interconnect through HyperTransport™ 3.0 technology. This direct connect architecture bypasses the PCI bottlenecks inherent in commodity networks and provides a peak of over 20 GB/s of injection bandwidth per node. The Gemini router’s connectionless protocol scales from hundreds to hundreds of thousands of cores without the increase in buffer memory required in the point-to-point connection method of commodity interconnects. The Cray XK7 network provides industry-leading sub-microsecond latency for remote puts and 1-2 microsecond latency for most other point-to-point messages. An internal block transfer engine is available to provide high bandwidth and good overlap of computation and communication for long messages. Advanced features include support for one-sided communication primitives and support for atomic memory operations. The proven 3-D torus topology provides powerful bisection and global bandwidth characteristics as well as support for dynamic routing of messages.
Integrated Hardware Supervisory System
Cray's Hardware Supervisory System (HSS) integrates hardware and software components to provide system monitoring, fault identification and recovery. An independent system with its own control processors and supervisory network, the HSS monitors and manages all major hardware and software components in the Cray XK7 supercomputer. In addition to providing recovery services in the event of a hardware or software failure, HSS controls power-up, power-down and boot sequences, manages the interconnect, reroutes around failed interconnect links, and displays the machine state to the system administrator. The Cray XK7 system also supports a warm swap capability allowing a system operator to remove and repair system blades without disrupting an active workload.
Cray XK7 System Resiliency
The Gemini interconnect is designed for large systems in which failures are to be expected and applications must run to successful completion in the presence of errors.
Gemini uses error correcting code (ECC) to protect major memories and data paths within the device. The ECC combined with the Gemini adaptive routing hardware (which spreads data packets over the four available lanes which comprise each of the torus links) provide improved system and applications resiliency. In the event of a lane failure, the adaptive routing hardware will automatically mask it out. In the event of losing all connectivity between two interconnects, the HSS automatically reconfigures it to route around the bad link.
Additionally, the Cray Linux Environment features NodeKARE™ (Node Knowledge and Reconfiguration). If a program terminates abnormally, NodeKARE automatically runs diagnostics on all involved compute nodes and removes any unhealthy ones from the compute pool. Subsequent jobs are allocated only to healthy nodes and run reliably to completion.
The Lustre file system can be configured with object storage target failover and metadata server failover. Software failover is provided for all critical system software functions.
Extreme Scale and Cluster Compatibility in One System
The Cray XK7 system provides complete workload flexibility. For the first time, you can buy a single machine to run both a highly scalable custom workload and industry-standard ISV workload. CLE accomplishes this through the Cluster Compatibility Mode (CCM). CCM allows out-of-the-box compatibility with Linux/x86 versions of ISV software – without recompilation or relinking – and allows for the use of various versions of MPI (e.g., MPICH, Platform MPI™). At job submission, you can request the CNL compute nodes be configured with CCM, complete with the necessary services to ensure Linux/x86 compatibility. The service is dynamic and available on an individual job basis.
Support for Other File System and Data Management Services
You can select the Lustre parallel file system or another option including connecting to an existing parallel file system. The Cray Data Virtualization Service allows for the projection of various other file systems (including NFS, GPFS™, Panasas® and StorNext®) to the compute and login nodes on the Cray XK7 system. The Cray Data Management group can also provide solutions for backup, archiving and data lifecycle management.
Emphasis on Power Efficiency
Many-core processing is the key to ultimate energy efficiency. Applications using the Cray XK7 GPU processors will experience industry-leading energy efficiency when measured for real application workloads. Combined with our standard air- or liquid-cooled High Efficiency cabinet and optional ECOphlex™ technology, the Cray XK7 system can reduce cooling costs and increase flexibility in datacenter design. Each High Efficiency cabinet can be configured with inline phase-change evaporator coils which extract virtually all the heat imparted to the airstream as it passes through the cabinet. Coolant is recondensed in a heat exchange unit connected to the building chilled water supply.
ECOphlex technology accommodates a range of building water temperatures, so a modern datacenter can operate chillers and air handlers less often, reducing electrical costs. In fact, a system fitted with ECOphlex operating at full capacity needs only cooling towers during much of the year in many climates.
Investment Protection and Blended Systems
The Cray XK7 supercomputer is engineered for easy, flexible upgrades and expansion – prolonging its productive lifetime and your investment. As new technologies become available, you can take advantage of these next-generation compute processors, I/O technologies and interconnect without replacing the entire Cray XK7 system. In addition, Cray XK7 and Cray XE6 systems support blended configurations on the same Gemini interconnect and share the same power, cooling, I/O and service infrastructure, making it easy for current Cray XE6 users to add Cray XK7 technology.