Processing and memory bottlenecks can run but they can’t hide. Not indefinitely, at least. And especially not when four technology leaders combine efforts against them.
Cray, Livermore Software Technology Corporation (LSTC), the National Center for Supercomputing Applications (NCSA) and Rolls-Royce are partnering on an ongoing project to explore the future of implicit finite element analyses of large-scale models using LS-DYNA, a multiphysics simulation software package, and Cray supercomputing technology. As the scale of finite element models — and the systems they run on — increase, so do scaling issues and the amount of time it takes to run a model.
Understanding that, ultimately, only time and resource constraints limit the size and complexity of implicit analyses (and, subsequently, the insights offered by them), Cray, LSTC, NCSA and Rolls-Royce are focusing on identifying what’s constraining these models as they scale and applying their learnings to enhancing LS-DYNA.
They’re making some surprising discoveries.
For the project, Rolls-Royce created a family of dummy engine models using solid elements and as many as 200 million degrees of freedom. Then NCSA ran the models with specialized LS-DYNA variants on “Blue Waters,” their Cray supercomputer system. The biggest challenge they’ve uncovered so far is the need to be able to reorder extremely large sparse matrices to reduce factorization storage and operations. This discovery has led to broader changes in LS-DYNA as well as the exposure of some unexpected bugs in the software.
After some initial adjustments to LS-DYNA, including integrating LS-GPart (an alternative nested dissection strategy based on half-level sets), the group ran a small dummy engine model with 105 million degrees of freedom to begin their analysis of the software’s scaling. Access to Blue Waters’ thousands of cores represented a rare opportunity to observe performance at this scale — and the opportunity to optimize for it. The researchers got to do both. And it uncovered three interesting scaling bottlenecks — input processing by MPI rank 0, symbolic factorization, and constraint processing.
The group has also run one large engine model. Using a Cray system, this time with newer nodes containing 192 GB of memory per node, the large model ran in 12 hours using 64 nodes.
See how the collaborators reached their conclusions. Read the full white paper, “Increasing the Scale of LS-DYNA Implicit Analysis,” here.