Side channel silent karatsuba / mpn_addmul_2 karatsuba

Thu Dec 13 12:07:49 UTC 2018

nisse at lysator.liu.se (Niels Möller) writes:

  > I am looking into doing karatsuba multiplication at evaluation points 0,
  > +1, and infinity.  We usually evaluate in -1 instead of +1.

  It will be very interesting to see what thresholds we get with that.

Marco's code might be in shape to enlighten us about that!

  I don't think we should rule out using -1. It needs side-channel silent
  conditional negation, but that's not so hard. sec_invert.c implements
  mpn_cnd_neg in in terms of mpn_lshift and mpn_cnd_sub_n, one could also
  do it as a single pass xor:ing a mask + propagating carry.

  What's most efficient is not at all clear to me: negation costs O(n),
  but so does handling of the extra top bits one get with evaluation in
  +1.

My hope is to handle those extra bits for free as one adds parts
together.  (I am sure Marco gave all this a thought, looking at his
solutions would be enlightening.)

  > My first project is to write some mpn_addmul_2 assembly functions for
  > machines with slow 64b x 64b -> 128b hardware.  E.g., all ARM designed
  > Arm64 cores have a throughput of 1/7 per cycle for that operation, while
  > plain operations has a throughput of 2 to 3 per cycle.  If things pan out,
  > we might be able to reach a addmul accumulation performance of 5.25 cycles
  > per limb.  Very far from great, but less awful than 7.

  I did a sketch for x86 a while ago, but didn't complete it. See
  https://gmplib.org/list-archives/gmp-devel/2017-June/004651.html. This
  appears to use evaluation in -1.

Did you time and/or test it?

I am still playing in C with longlong.h, but I will likely try creating
something for arm64 (targeting at least a57, a72, a73, and perhaps a53).

-- 
Torbjörn
Please encrypt, key id 0xC8601622