Fast Inverse Square Root
The article “fast inverse sqrt” came to my attention. It shows a small function written in C which is amazingly fast and approximates sqrt(1/x) pretty well. Appearently it was used in the Quake source code to speed up vector normalizations. But how does it work? Also, can it be improved?
Once you realize that the bit pattern of a positive 32-bit IEEE 754 floating point number interpreted as integer (f2i) “satisfies”
f2i : float int (reinterpretation of bit pattern)
f2i
for some constant around
it explains why the initial estimate of sqrt(1/x) can be computed the way it’s done in the following code snippet:
float InvSqrt(float x){
float xhalf = 0.5f * x;
int i = *(int*)&x; // store floating-point bits in integer
i = 0x5f3759d5 - (i >> 1); // initial guess for Newton's method
x = *(float*)&i; // convert new bits into float
x = x*(1.5f - xhalf*x*x); // One round of Newton's method
return x;
}
The magic line “i = 0x5f3759d5 – (i >> 1)” is due to
The accuracy of the above C implementation is best described by the possible range of relative errors:
This is a maximum relative error of 0.001735. After some testing and experimenting I derived a modified version by replacing the constants:
float InvSqrt(float x){
uint32_t i = 0x5F1F1412 - (*(uint32_t*)&x >> 1);
float tmp = *(float*)&i;
return tmp * (1.69000231f - 0.714158168f * x * tmp * tmp);
}
This function returns floats with a relative error from the range
So, the maximum relative error is limited to 0.00065 which is nearly a third of the original version’s error. This improvement comes with no extra cost. I just replaced the constants.
What’s the magic behind it? Here’s a quick explanation of how I got to these coefficients: First, I tried to find another integer for the initial estimate. The goal was not to minimize the relative error but to minimize the size of the error range measured via where
and
are the maximum respective minimum possible relative errors. A simple bisection performed by hand led to the above constant of 0x5F1F1412. It could be a local minimum, though. There may be another slightly better choice. Note that since I didn’t minimize the relative error for the initial estimate it’s off by some factor of around 0.88. But this bias is compensated for in the next step. The other two coefficients are finally chosen to minimize the maximum relative error.
– P
Let me just add that this trick is really a hack. It only works on machines that use 32 bit IEEE 754 floats and the same byte order for ints and floats. Also the casting isn’t without problems. At least the C++ standard says it invokes undefined behaviour. The only thing that is guaranteed to work is to copy a float to an int by memcpy. The above assumptions about float / int representations still apply.
Note that the approximation only does some bit manipulation and some floating point multiplications and additions. This is much faster than correctly calculating the square root followed by a floating point divition which are both expensive operations.
pizer
November 26, 2008 at 4:27 pm
Nice work. Some processors have an instruction that estimates this (or example, some PowerPCs, Google for frsqrte). I wonder what algorithm they use.
drj11
May 6, 2009 at 12:24 pm
Hello,
I’ve found this article after making a similar search for constants. The best error I got was 0.0006501978 with this code (yours is 0.0006531066):
float inv_sqrt(float x)
{ union { float f; uint32 u; } y = {x};
y.u = 0x5F1FFF77ul – (y.u >> 1);
return 0.703974056f * y.f * (2.38919526f – x * y.f * y.f);
}
The search program can be downloaded on my site (rrrola.wz.cz/inv_sqrt.html). I believe this error is optimal to 5 digits.
Rrrola
February 25, 2010 at 4:44 am