Side channel silent karatsuba / mpn_addmul_2 karatsuba

Torbjörn Granlund tg at
Thu Dec 13 12:07:49 UTC 2018

nisse at (Niels Möller) writes:

  > I am looking into doing karatsuba multiplication at evaluation points 0,
  > +1, and infinity.  We usually evaluate in -1 instead of +1.

  It will be very interesting to see what thresholds we get with that.

Marco's code might be in shape to enlighten us about that!

  I don't think we should rule out using -1. It needs side-channel silent
  conditional negation, but that's not so hard. sec_invert.c implements
  mpn_cnd_neg in in terms of mpn_lshift and mpn_cnd_sub_n, one could also
  do it as a single pass xor:ing a mask + propagating carry.

  What's most efficient is not at all clear to me: negation costs O(n),
  but so does handling of the extra top bits one get with evaluation in

My hope is to handle those extra bits for free as one adds parts
together.  (I am sure Marco gave all this a thought, looking at his
solutions would be enlightening.)

  > My first project is to write some mpn_addmul_2 assembly functions for
  > machines with slow 64b x 64b -> 128b hardware.  E.g., all ARM designed
  > Arm64 cores have a throughput of 1/7 per cycle for that operation, while
  > plain operations has a throughput of 2 to 3 per cycle.  If things pan out,
  > we might be able to reach a addmul accumulation performance of 5.25 cycles
  > per limb.  Very far from great, but less awful than 7.

  I did a sketch for x86 a while ago, but didn't complete it. See This
  appears to use evaluation in -1.

Did you time and/or test it?

I am still playing in C with longlong.h, but I will likely try creating
something for arm64 (targeting at least a57, a72, a73, and perhaps a53).

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list