PS: mpn_sqrtrem1

Tue Dec 20 18:09:38 UTC 2016

I'm not sure using a table of invroot*invroot would bring speedup.

On one side, maybe prefetch processor stages can read from the table transparently,
But using a table involves adding a table to the binary + doing a memory access.

On the other side, doing invroot*invroot is a simple register-only arith instruction.
This is often greatly optimized by compilers + register renaming stuff in the processor.

Maybe worth a test?

Adrien

On Tue, 2016-12-20 at 15:50 +0100, Torbjörn Granlund wrote:
> > "Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
> 
>   > On the other side, both sqrt64_ and sqrt64x2_ use invroot*invroot, maybe
>   > table can store both the value and the squared value.
>   
>   The same comment applies also to current code in GMP, the GMP_NUMB_BITS>32
>   version :-)
>   
> Surely possible.  The cost would be at least 768 bytes.
> 
> The x0 value which is squared is 9 bits, with the msb predictably 1.
> With some extra instructions, 384 to handle that msb 16-bit entries
> would work.
> 
> But with extra instructions, the benefits will quickly evaporate...
>