Optimising a simple 2D grid simulation
This post will present a series of video showing how I took a simple 2D grid simulation and made it quicker and more efficient using a variety of techniques.
In this first video we look at the base simulation code and the overall framework of code I will be using for the simulation. It should be noted we are only worried about using an “Embarrassingly Parallel algorithm” in this demo for simplicity and to highlight the processes of speeding things up. In a future post I will look at methods for doing more complex algorithms involving spatial data structures and acceleration structures.
In this version I re-structure the code to use a Structure of Arrays SOA style layout. This has an advantage of making the code more cache friendly. It also has the major advantage of making loading to OpenGL easier and making the rest of the optimisations I’m going to make easier.
In this demo I convert the drawing routines to use Texture Buffer Objects (TBO)[https://www.khronos.org/opengl/wiki/Buffer_Texture] and use texelFetch to get the data.
As we saw in the previous demo we are not really exercising the CPU very well to do this we need to partition the work to use more of the CPU cores. We could use OpenMP parallel for, however this is not natively supported on the mac version of clang++ so in this demo I write my own based on the code here
In this version I use the parallel_for algorithm from Intel’s
tbb I installed this via vcpkg by using
./vcpkg install tbb:x64-windows
The biggest problem with the previous example was the data layout and how we updated the Texture buffers. In this demo we throw away the Vec3x8 class and hard code everything for speed (This is something that is mentioned a lot in Data Oriented Design texts)
Conclusions and future work
This has been a fun exercise, I still need to complete a Computer Shader and CUDA version of the code to do more comparisons. In the next set of demos I will look at a more complex problem.