Adrien Prost-Boucle adrien.prost-boucle at
Fri Mar 24 21:07:18 UTC 2017

As far as I know, these instructions are affected bu rounding mode.
And no instruction specifies the rounding mode.

So, I have to assume worst-case and consider the user programs may
have set a rounding mode that I don't expect.
That's why I propose - as a first proposal - a function that always
checks the output and applies a correction, +1 or -1.

On the other hand, it may be possible to first save the current
rounding mode, do sqrt, and restore the rounding mode.
Using such instructions would certainly enable to only check one direction.
But the tradeoff may not be in our favor. I didn't test.

About why float versus double for 32-bit inputs:
I did some benchmarking, 2 versions:
- use float and check for correction +1 or -1
- use double and don't do correction
The float+correction is always faster.
Similarly for 64-bit inputs, I tested double+correction and long double
with no correction, and double+correction is always faster.

So, I assume that we use a version with correction.

Now, why float with 23-bit mantissa is enough for 32-bit inputs:
a change on the 16 least significant bits changes the root by only +/-1.
The mantissa may be too short by 9 bits, that'll always be covered by
the +/-1 correction.
My tests confirm this.

However, I've not tested with all possible rounding modes.
The worst corner case (hypothetically) would be that the root is first
too low by 1 because of the mantissa too short, and then too low by an
additional -1 because of rounding mode.
Then, the +/-1 correction would not be enough...
That's probably paranoia, but anyway should not be too difficult to check.
I'll investigate...


On Fri, 2017-03-24 at 15:56 +0100, Torbjörn Granlund wrote:
> I haven't followed this thread carefully, but now I have some questions:
> Are these instructions affected by rounding mode?  Are there variants
> where the instructions themselves specify the rounding mode?
> The adjustment steps look a bit confusing, and perhaps sub-optimal.
> Would it not be possible to set things up (with or without rounding
> control) such that the result is always at most one off and always in
> one direction?  Then a conditional increment (decrement) would suffice.
> Why do you use IEEE single-precision sqrtss when limbs are 32 bits? It
> appears that sqrtsd would give a correct result always after conversion
> back to mp_limb_t, without any "adjustments".

More information about the gmp-devel mailing list