Cray Logo

blog  facebook  twitter  linkedin  google plus  youtube
HomeSupportCustom EngineeringIndustry SolutionsProgramsAbout Cray
spacer
ComputingStorageBig Data
graphic
Request Quotespacer
spacer
 
graphic Computing
graphic
graphic

IDC White Paper Landing Page
 

Advanced Cluster Engine (ACE)

Cray’s Advanced Cluster Engine (ACE) management software is specifically designed for the Cray CS300 cluster supercomputer series and is part of the HPC cluster software stack. ACE offers a complete management software suite designed to eliminate the complexity of managing an HPC cluster while providing all of the tools necessary to run large, complex applications.

ACE includes a command-line interface (CLI) and graphical user interface (GUI) providing flexibility for the cluster administrator. An intuitive and easy-to-use ACE GUI connects directly to the ACE Daemon on the management server and can be executed on a remote system running Linux, Windows or Mac OS. Management modules include network, server, cluster and storage management.

ACE software supports:

  • Multiple network topologies and configurations with or without local disks
  • Network failover with maximum reliability
  • Customizable HPC development environment for industry-standard platforms and software configurations
  • Heterogeneous nodes with different software stacks
  • System power and temperature monitoring
Cray Advanced Cluster Engine (ACE) Management Software 
System Control Manager (SCM) Remote management – the cornerstone of ACE – provides the high availability, node provisioning, node image revision control, scalable root file system and interface to the GUI and CLI. It aggregates status information from all nodes and provides automated user management for the entire cluster.
Cluster Management Supports dynamic sizing, location assignment, and monitoring as well as automatic MAC address management. ACE provides the ability to create multiple clusters, each with unique node images, thereby allowing different operating systems, kernel or package versions to be operational. Up to 10 node image revisions per cluster can be maintained with the ability to roll back if necessary.
Server Management Currently supports up to 35,000 compute or hybrid compute nodes with dual or quad CPU configurations.
Network Management Support for redundant networks: InfiniBand for high-speed application communication between compute nodes and Gigabit Ethernet for operational and management networks. The operational network can also be used to connect to an optional 10Gb Ethernet global file system or for external high bandwidth communications.
Storage Management Supports scalable root file systems for diskless nodes, multiple global storage configurations, high bandwidth to secondary storage and server status to the management server.

ACE supports the following hardware options for building an application foundation optimized for each installation and delivering simplified network, server, cluster, storage management and administration.

Management Components
- High availability and sub-management servers
- Dual-rail management networks
- Multi-tiered sub-management servers support more than 16,284 compute nodes

Compute Nodes
- Intel® Xeon® processor
- Intel® Xeon Phi™ coprocessors
- AMD Opteron™
- NVIDIA® Tesla® GPU accelerators
 
 Storage
- Lustre/Panasas
- Fibre channel, InfiniBand, GigE/10GigE  

Interconnects
- Single and dual-rail InfiniBand (QDR, FDR)
- Single dual-rail Ethernet (GigE/10GigE)

External Communication Devices (ECD)
- Operational network
- Login nodes
- Visualization
- GigE/10GigE
- Firewalls
- Storage

Cray HPC Cluster Software Stack

 
   
spacer

graphic