Fast Inverse Square Root

The article “fast inverse sqrt” came to my attention. It shows a small function written in C which is amazingly fast and approximates sqrt(1/x) pretty well. Appearently it was used in the Quake source code to speed up vector normalizations. But how does it work? Also, can it be improved?

Once you realize that the bit pattern of a positive 32-bit IEEE 754 floating point number interpreted as integer (f2i) “satisfies”

f2i : float $\mapsto$ int (reinterpretation of bit pattern)

f2i $(x) \approx 2^{23} \log_2(x) + C$

for some constant $C$ around $2^{30}$ it explains why the initial estimate of sqrt(1/x) can be computed the way it’s done in the following code snippet:

float InvSqrt(float x){
   float xhalf = 0.5f * x;
   int i = *(int*)&x; // store floating-point bits in integer
   i = 0x5f3759d5 - (i >> 1); // initial guess for Newton's method
   x = *(float*)&i; // convert new bits into float
   x = x*(1.5f - xhalf*x*x); // One round of Newton's method
   return x;
}

The magic line “i = 0x5f3759d5 – (i >> 1)” is due to

$\log{\sqrt{1/x}} = -\frac{1}{2} \log{x}$

$\Rightarrow f2i( \sqrt{1/x} ) \approx C - \frac{f2i(x) - C}{2} = \frac{3}{2} C - \frac{1}{2} f2i(x)$

The accuracy of the above C implementation is best described by the possible range of relative errors:

$-0.001735 \, \ldots \, 0$

This is a maximum relative error of 0.001735. After some testing and experimenting I derived a modified version by replacing the constants:


float InvSqrt(float x){
   uint32_t i = 0x5F1F1412 - (*(uint32_t*)&x >> 1);
   float tmp = *(float*)&i;
   return tmp * (1.69000231f - 0.714158168f * x * tmp * tmp);
}

This function returns floats with a relative error from the range

$-0.00065 \, \ldots \, 0.00065$

So, the maximum relative error is limited to 0.00065 which is nearly a third of the original version’s error. This improvement comes with no extra cost. I just replaced the constants.

What’s the magic behind it? Here’s a quick explanation of how I got to these coefficients: First, I tried to find another integer for the initial estimate. The goal was not to minimize the relative error but to minimize the size of the error range measured via $(1+e_{max})/(1+e_{min})$ where $e_{max}$ and $e_{min}$ are the maximum respective minimum possible relative errors. A simple bisection performed by hand led to the above constant of 0x5F1F1412. It could be a local minimum, though. There may be another slightly better choice. Note that since I didn’t minimize the relative error for the initial estimate it’s off by some factor of around 0.88. But this bias is compensated for in the next step. The other two coefficients are finally chosen to minimize the maximum relative error.

– P

Written by pizer

October 12, 2008 at 1:17 am

Posted in Math

Tagged with approximation, fast, sqrt

3 Responses

Subscribe to comments with RSS.

Let me just add that this trick is really a hack. It only works on machines that use 32 bit IEEE 754 floats and the same byte order for ints and floats. Also the casting isn’t without problems. At least the C++ standard says it invokes undefined behaviour. The only thing that is guaranteed to work is to copy a float to an int by memcpy. The above assumptions about float / int representations still apply.

Note that the approximation only does some bit manipulation and some floating point multiplications and additions. This is much faster than correctly calculating the square root followed by a floating point divition which are both expensive operations.

pizer

November 26, 2008 at 4:27 pm
Nice work. Some processors have an instruction that estimates this (or example, some PowerPCs, Google for frsqrte). I wonder what algorithm they use.

drj11

May 6, 2009 at 12:24 pm
Hello,

I’ve found this article after making a similar search for constants. The best error I got was 0.0006501978 with this code (yours is 0.0006531066):

float inv_sqrt(float x)
{ union { float f; uint32 u; } y = {x};
y.u = 0x5F1FFF77ul – (y.u >> 1);
return 0.703974056f * y.f * (2.38919526f – x * y.f * y.f);
}

The search program can be downloaded on my site (rrrola.wz.cz/inv_sqrt.html). I believe this error is optimal to 5 digits.

Rrrola

February 25, 2010 at 4:44 am

Pizer’s Weblog