you are here:   / News & Insights / Engineering Advantage Blog / Speeding Up Your Analysis – Part 1

Engineering Advantage

Speeding Up Your Analysis – Part 1

January 10, 2017 By: James Kosloski

“Time is Money” – a quote generally attributed to Benjamin Franklin, but the idea is pretty obvious; the faster we can get things done, the better. This certainly holds for simulation as well.  Being able to obtain a solution to your problem in a shorter amount of time is always desired.  So how can we get our analyses to run faster?

There are many things we can do when building and running a model to help speed up the analysis, like meshing techniques, non-linear convergence enhancement techniques, etc., some of which have been covered in previous blogs such as: How Can I Get My Contact Problem to Converge?, The Value of Beam Elements in Structural Analysis and Understanding and Using Shell Element Results - Part I

In this two-part blog post, I am going to look at how computer resources affect the run time of an analysis.

The three main components of a computer that can affect how quickly an analysis can run are: processors, RAM and hard disk.

In this post, we will look at how processors, or CPU, affects the run time.

The processor, or CPU, is essentially the number cruncher for the computer. There are 2 ways a processor can affect run times: processor clock speed and number of cores.

CPU Clock Speed

A processor’s clock speed, usually given in gigahertz (GHz), is a measure of how many clock cycles a CPU can perform per second. Basically, is says how many times per second the processor can be asked to do something.  The faster the clock speed, the more calculations can be performed in a given time.

Be careful, however. It is only valid to compare clock speeds on similar chips in the same “family” and same “generation”. So, an Intel Xeon chip of the “Broadwell” family running at 3 GHz will be 20% faster than a Broadwell chip running a 2.5Ghz, meaning the same analysis run on these 2 chips will finish 20% sooner on the 3.0GHz chip. BUT, if we compare a Broadwell family chip to an older Sandy Bridge family chip both running at 2.5GHz, the Broadwell chip is going to be much faster. This is because with each new generation, the chips are capable of doing much more per cycle. You may have had a computer from 5 or more years ago, that had a chip that ran at 3.0GHz and you find that in your new computer the chip is running at 2.8GHz, the newer chip will still be much faster than the older chip. Doing more per cycle allows newer chips to run cooler and require less power. 

Instead of comparing clock speeds, there are several sites that benchmark chips so that it is easier to compare one to the other, for example: www.cpubenchmark.net

Using Multiple Cores

The other way to gain speed from processors is to use more of them. Almost every CPU released in the last several years comes with multiple cores. Each core of a CPU functions as an individual processor, so a CPU with 8 cores can perform 8 calculations simultaneously. Some high-end machines have multiple CPU’s each with multiple cores.  So, a workstation with dual Intel Xeon chips, each with 12 cores, will effectively have 24 processors that can be used.

In order to make use of multiple cores, the software you are using must support parallel processing. The code must be designed to effectively divide up the problem so that each available core handles part of the calculations.

Making use of parallel processing can provide significant time savings and, therefore, cost savings. As an example, we performed a linear static structural analysis using the ANSYS Mechanical software for the model shown below:

Linear Static Structural Analysis | FEA Consulting

The model consists of approximately 520,000 nodes and 260,000 elements. We ran this on a computer with a single Intel® Xeon® E5-2697 v4 chip at 2.3GHz. This chip has 18 cores, we ran this model distributed on 2, 4, 6, 8, 12 and 16 cores. The results are shown in the chart below.

Analysis Time per Iteration vs. Number of Cores | CFD Consulting

The times listed represent the elapsed time (in seconds) to run 1 iteration. Most complex non-linear jobs can take 100’s or 1000’s of iterations to complete. It should be noted that parallel processing does not affect the number of iterations required to solve a nonlinear analysis, it only affects how long each iteration takes to run.

It can be seen from this example that using 12 cores vs. 2 cores gives over a 3x speed up in run time. You can also see that speed-up starts to decrease with more cores. For this case, using 16 vs. 12 cores gives no significant benefit. How well a problem scales depends on many factors, but problem size plays a big role.  Larger problems will generally scale better to more cores.

Distributed vs. Shared Memory Parallel Processing

There are two main methods of parallelizing an analysis: distributed processing and shared memory processing. Shared memory parallel (SMP) is run on one machine therefore all of the cores make use of the same memory, thus the name shared memory. This type of parallel processing is achieved by performing many of the vector calculations in parallel (i.e breaking up the calculations onto each core). 

Distributed processing is done through domain decomposition. The problem is broken up into domains and each domain is sent to a different processor.  Distributed processing may be performed on a single machine with multiple cores or it can be distributed across several machines. By distributing across multiple machines, distributed processing can make use of the RAM and other resources on the other machines. Therefore, jobs that are too large to be solved on a single machine can easily be solved by distributing across multiple machines.

When distributing a job across multiple machines, one needs to be aware of interconnect speeds. Even though each machine may be solving a separate part of the model, there is a lot of communication that must still happen between the processors. The interconnect speed can easily become a bottle-neck for the analysis and you will find with slow interconnects, you do not get very good scaling. A gigabit connection is an absolute minimum connection speed, but Infiniband or higher connection speed is recommended.

Since different algorithms are used for distributed and shared memory parallelization, the scaling benefits will also be different.They will vary depending on the software being used, the type of analysis, and the size of the problem.

For more information on parallel processing check out some of our previous blogs:

HPC Best Practices for Structural Mechanics - Part I

HPC Best Practices for Structural Mechanics - Part II

Hopefully this post has given you some insight into how your CPU can influence how fast your job will run. In the next installment of this blog, I will look at how RAM and hard disk can also affect your analysis times.