adrien.prost-boucle at laposte.net
Tue Dec 20 18:09:38 UTC 2016
I'm not sure using a table of invroot*invroot would bring speedup.
On one side, maybe prefetch processor stages can read from the table transparently,
But using a table involves adding a table to the binary + doing a memory access.
On the other side, doing invroot*invroot is a simple register-only arith instruction.
This is often greatly optimized by compilers + register renaming stuff in the processor.
Maybe worth a test?
On Tue, 2016-12-20 at 15:50 +0100, Torbjörn Granlund wrote:
> > "Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
> > On the other side, both sqrt64_ and sqrt64x2_ use invroot*invroot, maybe
> > table can store both the value and the squared value.
> The same comment applies also to current code in GMP, the GMP_NUMB_BITS>32
> version :-)
> Surely possible. The cost would be at least 768 bytes.
> The x0 value which is squared is 9 bits, with the msb predictably 1.
> With some extra instructions, 384 to handle that msb 16-bit entries
> would work.
> But with extra instructions, the benefits will quickly evaporate...
More information about the gmp-devel