Side channel silent karatsuba / mpn_addmul_2 karatsuba
tg at gmplib.org
Thu Dec 13 15:59:45 UTC 2018
nisse at lysator.liu.se (Niels Möller) writes:
With evaluation at +1, (a1+a0) and (b1+b0) are each n limbs + a bit, and
the product is 2n limbs + 2 bits. If we handle the extra bits
separately, rather than doing it as an (n+1)x(n+1) multiply, I think that
would be two conditional n-limb additions, at weight B^2 in the final
product, and a single bit (the and of the two extra bits) at weight B^3.
I see that the last bit has some potential for folding with some other
carry propagation pass over the highest n limbs, but I don't see how to
do it cheaper than 2n carry propagations.
So +1 sounds better than -1. But it were still surprisingly many
operations on the n=1 case.
Yes, we trade a multiply for a bunch of additions.
For some archs, notably ARM inc's Arm64 cores, we have at least 15 free
insn slots available for plain operations (add, sub, and, etc). (This is
because the 3 muls need 21 cycles.)
As far as I remember, I ran tests, and they failed...
Then, it might perhaps be buggy. ;-)
Please encrypt, key id 0xC8601622
More information about the gmp-devel