PS: mpn_sqrtrem1

Adrien Prost-Boucle adrien.prost-boucle at
Tue Dec 20 18:09:38 UTC 2016

I'm not sure using a table of invroot*invroot would bring speedup.

On one side, maybe prefetch processor stages can read from the table transparently,
But using a table involves adding a table to the binary + doing a memory access.

On the other side, doing invroot*invroot is a simple register-only arith instruction.
This is often greatly optimized by compilers + register renaming stuff in the processor.

Maybe worth a test?


On Tue, 2016-12-20 at 15:50 +0100, Torbjörn Granlund wrote:
> > "Marco Bodrato" <bodrato at> writes:
>   > On the other side, both sqrt64_ and sqrt64x2_ use invroot*invroot, maybe
>   > table can store both the value and the squared value.
>   The same comment applies also to current code in GMP, the GMP_NUMB_BITS>32
>   version :-)
> Surely possible.  The cost would be at least 768 bytes.
> The x0 value which is squared is 9 bits, with the msb predictably 1.
> With some extra instructions, 384 to handle that msb 16-bit entries
> would work.
> But with extra instructions, the benefits will quickly evaporate...

More information about the gmp-devel mailing list