How do Cray customers name their systems? An occasional look at the stories behind the names. .
From 1831 to 1836, Charles Darwin journeyed around the world as the naturalist aboard the HMS Beagle. His diary of that expedition, detailing the specimens he collected and the revolutionary ideas they sparked, were collected in “The Voyage of the Beagle,” published in 1839. Twenty years later, those ideas would mature into “On The Origin of Species,” Darwin’s argument for natural selection and the dawn of modern evolutionary biology.
The voyage of our Beagle began in 2010, when the Computation Institute — a joint initiative of the University of Chicago (UChicago) and Argonne National Laboratory — purchased the Cray® XE6™ supercomputer with National Institutes of Health funding. From its launch, Beagle carried a specific vision: to provide high-performance computation, simulation and data analysis for the biomedical research community. Just as the HMS Beagle was a scientific “instrument” that propelled Darwin to his landmark theory, our Beagle helps researchers studying genomics, biochemistry, neurobiology, medicine and more make significant leaps toward groundbreaking discoveries.
Beagle allows its users to run models and analyses too large and complex for traditional computer resources — projects that connect related and unrelated research fields and data collected at different scales of time and space. The system was acquired for several types of uses, including:
- Floating point/MPI-intensive computations such as molecular dynamics (g., NAMD, Gromacs);
- Agent-based models of sepsis and neuronal simulations;
- Computationally intensive analysis of medium sized datasets (up to a few TB) such as image processing (using Matlab, Python and custom C++ code);
- Complex Bayesian estimation analysis (using R) and neurobiology spike-train and other experimental data analysis; and
- I/O-intensive genomics analysis of large whole genome and exome datasets (hundreds of TBs), such as The Cancer Genome Atlas, based on complex workflows and standard packages (g., BWA, GATK andPicard).
Our main goal was not to simply provide cycles and a computational facility, but rather a system accessible to users who had limited to nonexistent computational experience. With Beagle (and its dedicated staff) researchers new to HPC can perform the computations necessary to analyze or model their systems at a scale compatible with modern science requirements, without having to develop years of experience and a staff to match.
Researchers have moved rapidly to make effective use of this advanced instrument, producing scientific advances in multiple areas including: cancer therapy, congenital cardiomyopathy gene analysis, the genetic architecture of asthma, the molecular basis of Alzheimer disease, neocortical simulations, characterization of neuronal networks, advanced breast cancer image analysis, reconstruction of metabolic models, protein structure prediction, template-based protein modeling, protein-RNA modeling and metagenomics. Nearly 100 conference proceedings and peer-reviewed papers have acknowledged the use of this facility.
Specs: Cray® XE6™ supercomputer. Beagle was installed in 2010 as a 150 TF, 18,000-core system. It was renamed Beagle-2 after being upgraded in 2014 to 24,000 cores and 250 TF to allow our biochemists to investigate bigger systems on longer time scales. The memory per node was doubled and the disk space quintupled to 2.5 PB to accommodate the needs of genomics research.
Together, we designed a system that would require as little effort as possible for our users to transition from the old system: utilizing the same scheduler, same resource manager, same file system and the same compilers and libraries. We also focused on upgrading the parts that were of highest relevance to our projects, and thus more likely to be cost effective. The modular approach used by Cray greatly helped to make this possible. We also added a few NVIDIA Keplers to give our users the opportunity to develop the skills and understanding to move to many-core architectures, which will likely underlie future systems.