Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Also, floating point exponentiation is incredibly expensive (as are transendental functions). If you're computing the same exponentiation multiple times, you should replace them with a scalar constant to save on computation. Also, if you're exponentiating by an integer, make sure it's an integer! "a**2." consumes way more time than "a**2", which the compiler will likely recognize and change to "a*a".

FMA - Fused Multiply and Accumulate will compute a . x + b for the cost of a multiplication (on x86 AVX2 and CUDA), so structuring your code to take advantage of this will double the effective number of FLOPS you can get.

Accelerator Issues

Pushing Loops Down the Callstack

...