MSC Nastran is a widely used structural analysis applications, especially for large modal analysis (i.e., eigenvalue) simulations. It requires a high-capability I/O system for good throughput performance. However, good I/O performance on a cluster architecture can be a challenge, since clusters are often configured to maximize compute scalability and have relatively weak I/O capability per node.
To address this challenge, Cray introduced the Cray® DataWarp™ I/O acceleration capability, offered in Cray® XC40™ supercomputers. The DataWarp applications I/O accelerator leverages features of the Cray Linux® environment, solid-state storage (SSD) and the Cray high-speed network to enable high-performance I/O at every node in the system without the need for SSDs on every node. The DataWarp accelerator also works with the underlying parallel file system (e.g., Lustre®) to automatically stage data in/out of the flash storage tier as jobs begin and end.
Unlike computational fluid dynamics and explicit structural applications (e.g., crash simulations), most implicit structural simulations do not scale to hundreds of cores, and I/O performance is critical. In this post I’ll share the results of a recent test that put Cray’s DataWarp technology to work on a large MSC Nastran simulation.
Improved I/O performance for NVH
This example is based on an MSC Nastran “noise, vibration and harshness” (NVH) simulation of an automotive floor pan. NVH simulation is a core technology in the automotive design process, and fast turnaround is especially important. The simulation was run twice, first using only the spinning disk in a high-performance Lustre file system for I/O and then using the DataWarp feature on the XC40 system. The use of DataWarp acceleration significantly reduced I/O time and, most importantly, the simulation finished in half the elapsed time.
With over 19 million degrees of freedom the MSC Nastran NVH simulation is typical of the size of models that are now common in the automotive industry. The Lanczos algorithm was used to extract the first 10 modes. That’s a relatively small number of modes for this size model, but when using Lustre alone the I/O requirements still account for over half the execution time. For this example the simulation elapsed time went from 17,000 seconds with Lustre alone to 8,500 seconds with DataWarp technology.
What is requiring this much I/O? Analysis of this simulation showed that the files being read forward and backward multiple times in the Lanczos eigenvalue algorithm dominated the I/O time. While Lustre is optimized for performance while streaming data sequentially forward through the file, the backward read pushes the load into more of a transactional form. The DataWarp accelerator’s solid-state storage and specialized file system work well in this domain and allow the backward reads to go dramatically faster.
In this case, DataWarp technology is the perfect complement to Lustre to offload the I/O-intensive portions of Nastran to the flash storage tier.
This is illustrated in the following image, generated with the IOT toolkit from I/O Doctors, LLC. The two plots depict the MSC Nastran read and write access patterns versus time of the SCR300 scratch file. The upper plot shows the job utilizing DataWarp technology. The lower plot shows the job using Lustre only. Note the behavior of the Lustre job during the backward read after the client cache is exceeded. The data delivery rate becomes much slower, as indicated by the shallow slope. This is because Lustre does no prefetching when a file is accessed backward.
DataWarp accelerator overview
Of course, using SSDs to reduce I/O is not a new approach. Indeed, Cray first introduced SSDs in the early 80s, and in previous blog posts we have discussed the performance of the Cray® CS400™ system running MSC Nastran using SSDs configured on the node.
What is new is the use of the DataWarp I/O accelerator. A key advantage of the DataWarp technology is you do not have to configure an SSD on each node of the system to get SSD performance on every node. SSDs offer great I/O performance, but they cost significantly more than spinning disks, and hence are expensive to configure if they are not going to be fully utilized. With DataWarp technology, a limited number of SSDs are configured in the system and can be allocated at run time to the nodes and applications that require enhanced I/O performance. This is achieved via a combination of Cray technologies:
- The XC40 system’s high-speed interconnect for fast data movement between nodes
- Cray Data Virtualization Service (DVS), I/O forwarding and software infrastructure used to project an underlying file system to a group of compute nodes at run time
- A tightly integrated hardware and software environment for HPC efficiency
Cray systems are engineered for production HPC environments, which involve running a wide variety of applications with a wide variety of performance requirements. As users increase the fidelity of their simulations, including those that do not scale well across nodes, Cray’s DataWarp technology can help them address the full range of HPC applications.