Generalizing Montgomery mod algorithm

Torbjorn Granlund tg at
Thu Jun 14 15:39:19 CEST 2012

nisse at (Niels Möller) writes:

  I think I have (re-)discovered a different schoolbook division
  algorithm. I'll describe the case that we only want the remainder. I'm
  fairly confident one can assemble the quotient as well, with a bit of
  extra book-keeping, but I'm less sure that it can be faster than a
  schoolbook division loop around 3/2 division.
I agree this method could make a faster SB mpn_tdiv_r, but I am
unconvinced that you can get the quotient with fewer operatons than the
current code.

Perhaps one could base a SB mpn_tdiv_qr using these ideas on parallel
computation of the new dn-limb remainder and the corresponding quotient
limb.  This would mainly move the quotient limb computation off of the
critical path.  A problem that might then show up is that final
adjusting of the quotient limb might be tricky.

We should also be aware of that SB mpn_tdiv_qr could be rewritten with
speculative quotient limb computation, in effect (but not completely)
removing it from the critical path.  The idea is to compute the top
partial remainder limbs first before invoking mpn_submul_1, and stick
the *next* qlimb computation also before calling mpn_submul_1.

You should implement a SB mpn_tdiv_r based on your ideas.  It ought to
outperform the current code most when the divisor is small and the
dividend is fairly large.

(Almost) unrelated to performance is the side-chanell leakage
properties.  The new code could (as you mention) probably be easy to
make side-chanell silent.


More information about the gmp-devel mailing list