Make it work, Profile, Optimise. Simple rule that is so hard to follow. In my project I need to carry out a lot of floating point arithmetic operations along the critical path, so I decided to optimise my program with SIMD (Single Instruction Multiple Data) extensions. That's when a processor simulates vector processor operations by sticking 4 32-bit values into one 128-bit register and allows you to carry out operations on all 4 values at once.