Side channel silent karatsuba / mpn_addmul_2 karatsuba
Torbjörn Granlund
tg at gmplib.org
Thu Dec 13 12:07:49 UTC 2018
nisse at lysator.liu.se (Niels Möller) writes:
> I am looking into doing karatsuba multiplication at evaluation points 0,
> +1, and infinity. We usually evaluate in -1 instead of +1.
It will be very interesting to see what thresholds we get with that.
Marco's code might be in shape to enlighten us about that!
I don't think we should rule out using -1. It needs side-channel silent
conditional negation, but that's not so hard. sec_invert.c implements
mpn_cnd_neg in in terms of mpn_lshift and mpn_cnd_sub_n, one could also
do it as a single pass xor:ing a mask + propagating carry.
What's most efficient is not at all clear to me: negation costs O(n),
but so does handling of the extra top bits one get with evaluation in
+1.
My hope is to handle those extra bits for free as one adds parts
together. (I am sure Marco gave all this a thought, looking at his
solutions would be enlightening.)
> My first project is to write some mpn_addmul_2 assembly functions for
> machines with slow 64b x 64b -> 128b hardware. E.g., all ARM designed
> Arm64 cores have a throughput of 1/7 per cycle for that operation, while
> plain operations has a throughput of 2 to 3 per cycle. If things pan out,
> we might be able to reach a addmul accumulation performance of 5.25 cycles
> per limb. Very far from great, but less awful than 7.
I did a sketch for x86 a while ago, but didn't complete it. See
https://gmplib.org/list-archives/gmp-devel/2017-June/004651.html. This
appears to use evaluation in -1.
Did you time and/or test it?
I am still playing in C with longlong.h, but I will likely try creating
something for arm64 (targeting at least a57, a72, a73, and perhaps a53).
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list