Extreme Scaling in CAE Applications

With all the change that has happened around the high performance computing (HPC) environments used in manufacturing over the past 20 years, two constants have remained: 1) the computer-aided engineering (CAE) workload – and hence the demand for compute power – have nearly doubled every year; and 2) the names of the ISV applications used today for CAE simulations are pretty much the same as 20 years ago (e.g. NASTRAN, Abaqus, Fluent, LS-DYNA). The first constant is a direct result of CAE becoming a core element in the product design process. The second is a result of the constant enhancements in both features and performance made by the ISVs.

The Move to Extreme Scalability

It is the HPC performance, and specifically the parallel performance of these applications, that I’d like to highlight. For many years CAE applications have run MPI parallel, but we are now seeing extreme scalability in the CAE applications (i.e. scaling to > 1000 cores) which is going to drive significant changes in CAE simulation.

For most of the past 30 years, to meet increasing demand for performance, the CAE field has relied on Moore’s Law or more specifically the steady increase in processor frequency.  In recent years the processor frequency has plateaued and the HPC industry has moved to “multi-core” processors to deliver more FLOPs per processor.  More cores per processor (combined with the move to cluster architectures) have resulted in a huge increase in total compute power.  Today it is common for large commercial CAE environments to have 10,000 cores available and several organizations are over 50,000 cores.  Of course, to leverage the increased performance of a multi-core processor implies the analysis is using more compute cores per simulation (i.e. more parallel scaling).  However, most manufacturing organizations have used the increase in total compute power to increase the overall throughput (i.e. capacity computing) and have not scaled up the performance for individual simulations (i.e. capability computing).

As the requirement for higher simulation fidelity is combined with tighter design schedules, there is a growing demand for significantly faster turnaround of large jobs – not 10 percent faster but 10 times faster.  With minimal increase in processor frequency, the best path to improve performance is parallel scaling of CAE simulations.  Recently the CAE ISVs have heard the demand for dramatically better scaling from their leading edge users and as a result there have been significant advances in applications scalability.  So the good news is that, although today most users rarely use more than a 256 cores, many compute-intensive applications do scale large CAE simulation to 2,500 cores or more.

However, for a new HPC technology to take hold in CAE there needs to be a high value simulation field (e.g. crash simulation or aerodynamics) which can leverage the technology.  Computational fluid dynamics (CFD) will likely drive the move to extreme scaling.  As an example of this, ANSYS has worked with Cray to improve Fluent scaling in recent versions to over 10,000 cores.

Cray and ISV Collaboration

Overall, we are seeing major scaling improvements in other CAE codes, across all simulation fields. Since most CAE simulations currently use less than 250 cores, scaling to 10,000 cores offers more than 40 times performance gain. So as the demand for compute power builds and the CAE applications are able to offer extreme scalability, large CAE environments are growing toward 100,000 cores or more.

Cray is working with third party ISVs to further advance simulation. As the leader at providing scalable HPC systems – systems with many thousands of cores – we work with third-party application providers to scale their applications to thousands of cores. The goal is to not only demonstrate the performance advantage but also quantify the value delivered in scaling CAE simulation to new levels.

Cray’s work with the ISVs has involved profiling applications to enhance scaling throughout the code and for a broad range of simulation features.  The joint effort between the system experts and the ISVs is crucial since the ISVs typically do not have the compute resources to test performance on thousands of cores.  Also, Cray has been able to work with the ISVs, and leading users, to demonstrate extreme scaling on production models.  These runs often require “cpu months” (24 hours/day x30 days = 720 hours) of compute time, but when using thousands of cores per simulation, this can be turned around in less than a day.

An example is a Metacomp/CFD++ simulation that was performed on a Cray system, and was a joint project between Cray, Metacomp and Swift Engineering.  The goal was to simulate the wake effect as one race car overtakes another. It required an unsteady solution for CFD – a model with over 110 million elements – moving mesh and rotating boundary conditions (for the tires).  Total run time was over 15 “core years” (i.e. more than 130,000 core hours).

What Does This Mean for HPC in the Manufacturing Segment?

It means the potential is there for many CAE simulations to obtain a factor of 10 increases in performance (and every factor of 10 helps).  Many types of simulations are waiting for this level of performance to become practical in a production environment such as LES CFD simulations, stochastic simulation for complete systems, and running the largest crash simulations multiple times per day.

Extreme scaling is the only way to achieve this level of performance for a broad range of simulation.  It will enable CAE simulations with higher fidelity and faster turnaround.  Hence, CAE will add even more value to the product design process.


    • 4

      Greg Clifford says

      Dennis, thank you for the feedback. It is not clear what caused the cureve to dip at ~11K cores but it is not unusual to see this type of behavior. It could be something with that specific domain decomp. Also the runs are not necessarily on a dedicated environment or the system environemnt may have change from one day to the next, so it is possible for that to cause some modest fluctuations. We have run other models which show this level of scaling but do not show this performance dip.

  1. 6

    Jimmy says

    What sources do you have that show the demand for computing power doubling every year?

    For NASTRAN, Abaqus, Fluent, LS-DYNA, do any of these options have a cloud version that allows this scaling over the cloud? Or are these clusters all in one location and run locally?

    • 7

      Greg says

      1. The statement about the growth in compute power is based mainly on person discussions with Auto and Aerospace companies from around the world over the past 25 years (and similar observations by other others in the industry). I think the point is that there has been a significant growth every year and as far as I can tell (and somewhat to my surprise) this growth is continuing for the foreseeable future. It should be noted that there are challenges such as power & cooling and SW costs which are a bigger issue every year.

      2. The short answer is yes, these applications can be used in a “cloud” environment and there are some organizations offering a cloud solution. But I think your question is more “is it practical to use these apps in a loosely coupled system or does it require a tightly integrated compute cluster?’. It is a matter of opinion as to what is practical but again the short answer is a tightly integrated cluster, with a high speed interconnect, is required for efficient scaling . Also there are major issues with data management when doing large CAE simulations on the cloud.

Leave a Reply to Greg Cancel reply

Your email address will not be published. Required fields are marked *