PS: mpn_sqrtrem1
Marco Bodrato
bodrato at mail.dm.unipi.it
Tue Dec 20 21:02:36 UTC 2016
Ciao,
Il Mar, 20 Dicembre 2016 7:09 pm, Adrien Prost-Boucle ha scritto:
> I'm not sure using a table of invroot*invroot would bring speedup.
> But using a table involves adding a table to the binary + doing a memory
> access.
There is a table already. My proposal is: using a single table, with
fatter values:
static const uint32 invsqrt8___ [] =
{(511*511<<10)|511, (509*509<<10)|509, ...}
...
uint64_t invroot = invsqrt8___[(vsh >> 55) - 128];
uint64_t squaredinvroot = invroot >> 10;
invroot &= 0x1f;
> Maybe worth a test?
Maybe.
> On Tue, 2016-12-20 at 15:50 +0100, Torbjörn Granlund wrote:
>> Surely possible. The cost would be at least 768 bytes.
>>
>> The x0 value which is squared is 9 bits, with the msb predictably 1.
>> With some extra instructions, 384 to handle that msb 16-bit entries
>> would work.
16 bits only... clever! but we can not load the 3 bytes at once, I fear...
Without wasting an extra byte anyway... or using it for another interlaced
table... no, too tricky.
Regards,
m
--
http://bodrato.it/
More information about the gmp-devel
mailing list