bodrato at mail.dm.unipi.it
Tue Dec 20 03:00:14 UTC 2016
Il Lun, 19 Dicembre 2016 6:21 pm, Adrien Prost-Boucle ha scritto:
> That said, the interesting part in my code is these functions:
> - sqrt32_inv() for single 32-bit words
> - sqrt64_inv() for single 64-bit words
> - sqrt64x2_inv() for double 64-bit words
Is there a reason why you defined three different invsqrt8_ arrays?
Doesn't invsqrttab contain suitable values?
On the other side, both sqrt64_ and sqrt64x2_ use invroot*invroot, maybe
table can store both the value and the squared value.
> I noted that GMP fallback function umul_ppmm(), in longlong.h in GMP code,
> uses 4 multiplications where the Karatsuba method would only requires 3,
> I was wondering whether optimization was possible...
Reducing the number of multiplications is possible... but I bet a
Karatsuba umul_ppmm() is not faster than the plain version (at least not
on current 64-bits CPUs ;-)
More information about the gmp-devel