The San Diego Supercomputer Center (SDSC) at the University of California San Diego enables international science and engineering discoveries through advances in computational science and data-intensive high performance computing. Continuing this legacy into the era of cyber infrastructure, SDSC is considered a leader in data-intensive computing, providing resources, services and expertise to the national research community. The mission of SDSC is to extend the reach of scientific accomplishments by providing tools such as high performance hardware technologies, integrative software technologies, and deep interdisciplinary expertise.
Challenge: Meeting the Demands of Data-Intensive Computing
The volume of data generated today is challenging the computing capability of traditional FLOPS-based systems and raising questions about their sufficiency for many research inquiries. Increasingly, researchers need to move data from disk to processor at rates and in volumes beyond what many current architectures can do.
In response to the need for systems geared for data-intensive applications, the National Science Foundation (NSF) issued a request for proposal for a data-intensive high performance computing (HPC) system in 2008. Specifically, a winning system would need to be optimized to support research involving very large datasets or very large I/O requirements. Additionally, the solicitation required the system to achieve a total peak computing capacity of at least 200 flops.
Solution: Cray CS300-AC Cluster Supercomputer
SDSC proposed and won a five-year, $20 million grant to build and operate “Gordon” — a data-intensive HPC system based on the Cray CS300-AC™ cluster supercomputer. Selected for its reliability, configuration flexibility and compatibility, the Cray CS300-AC architecture would help SDSC bridge the widening latency gap between main memory and rotating disk storage in modern computing systems. Along with a large addressable virtual memory and user-friendly programming environment, Gordon employs flash memory to provide a level of dense, affordable, low-latency storage that could be configured as either extended swap space or a very fast file system.
The Gordon project was rolled out in two phases. First, SDSC debuted “Dash” — a prototype system which served as a platform for testing the innovative features of the planned Gordon production system that would include new Intel processors, emerging interconnect products and topology and solid-state disks (SSD).
The Gordon production system came next. A much larger version of Dash, the Gordon compute cluster features 341 teraflops of peak performance, 64 terabytes of DRAM, 300 terabytes of flash memory and 4 petabytes of disk storage connected to a dual-rail QDR InfiniBand 3D torus network. Overall, the system is capable of handling massive databases while providing up to 100 times faster speeds when compared to hard disk drive systems for some queries.
Gordon is composed of 1,024 compute cluster nodes based on the Intel® Xeon® processor E5 product family and 64 I/O nodes based on the Intel® Xeon® X5650 processor. Each of these dual-socket compute nodes has two 8-core 2.6 GHz Intel Xeon E5 processors and 64 gigabytes of DDR3-1333 memory. Each compute node also has an 80 gigabyte Intel® solid-state drive that is used as the system disk. Each I/O node has two 6-core 2.67 GHz Intel® Xeon® processor X5650 series, 48 gigabytes of DDR3-1333 memory and 16 300-gigabyte Intel® solid-state drive 710 Series. The network topology is a dual rail 4x4x4 3D torus of switches with adjacent switches connected by three 4x QDR InfiniBand links (120 Gbit/s). Compute nodes (16 per switch) and I/O nodes (1 per switch) are connected to the switches by 4x QDR (40 Gbit/s).
Each server platform in the Gordon system features two socket CPUs configured with the latest Intel Xeon processor E5 Integrated I/O, supporting PCI Express 3.0 specification and Intel® Data Direct I/O technologies which makes the processor intelligently and dynamically determine the optimal path for I/O traffic based on the overall system utilization while allowing system memory to remain in a low power state. This feature reduces processor latency bottlenecks while improving the system performance and memory bandwidth. Gordon also takes advantage of Intel® Advanced Vector Extensions to achieve eight floating point operations/clock cycles, which is twice the performance than any other processor with the same core count and frequency. This provides dramatic performance improvements for scientific applications that are dominated by floating point operations.
Another key feature of Gordon is its large-memory “supernodes.” Each supernode consists of 32 compute nodes and two I/O nodes, each with 4 terabytes of flash memory. These supernodes create large shared-memory systems that are capable of presenting more than 2 terabytes of cache-coherent memory via virtual shared-memory software provided by Scale MP, Inc.
Along with its innovative supernodes, Gordon is the first high-performance computing system to employ massive amounts of SSDs. Each I/O node is capable of more than 560,000 IOPS, or 35 million for the entire system. The use of flash-based memory — common in smaller devices such as cell phones — as opposed to slower spinning disks means Gordon can do latency-bound file reads about 10 times faster than other supercomputers.
Gordon is part of the NSF’s Extreme Science and Engineering Discovery Environment (XSEDE) program — a nationwide partnership of 16 supercomputers and high-end visualization and data analysis resources. Since entering production in early 2012, Gordon’s unique architecture has been instrumental in making advancements in everything from climate science to large graph problems to quantum chemistry to stock market trends.