Possible to optimize for base 2 fermat primality test using shifts?

Sat May 18 20:57:43 UTC 2019

I've developed some code and done a little bit of performance testing.

On Sat, Apr 13, 2019 at 5:48 AM Marco Bodrato <bodrato at mail.dm.unipi.it>
wrote:

> I do not think it depends on the k-ary window, but it depends on the
> representation we use for numbers...
> The manual ( https://gmplib.org/manual/Modular-Powering-Algorithm.html )
> says that GMP uses "REDC method by Montgomery" to speed-up modular
> operations. If you use lshifts, that you have to use plain modular
> functions. Maybe it's worth, maybe it's not. You should test it.
>

Yes, after diving into GMP code and really trying to understand it some
more I realized this wouldn't work with REDC.

So for now I've implemented a Left to Right k-ary exponentiation (14.82
from Handbook of Applied Cryptography
http://cacr.uwaterloo.ca/hac/about/chap14.pdf ).  Since I'm only using base
2, there is no pre-computation steps, just getbits and shift by that amount
in between squarings.
The performance ratio (time to complete with my code / time using gmp
native mpz_powm ) is about 2.6-3.5 for values of a few limbs, so
significantly slower.  Although for very large numbers about 1000 limbs(I
tested against the 5000th primorial+1, basically: mpz_primorial_ui(P,
48611) + 1 ) it is almost comparable at ~1.25 ratio.  The code is here in
case anyone is interested:
https://gist.github.com/thehans/2e120447dcd5269afa5a8a7ca1fa5eac

Looking into these algorithms some more, it found it interesting that "For
the largest divisions, a block-wise Barrett division algorithm is used."
https://gmplib.org/manual/Block_002dWise-Barrett-Division.html#Block_002dWise-Barrett-Division,
however, this doesn't seem to apply to modular exponentiation, since that
only supports REDC?  I wonder if there is some possible speedup to be
gained from that.  If so, it wouldn't really be specific to base-2
exponentiation either, since I think most of the Barrett operations already
take advantage of shifts.

So my next step is to see if I can change my code to make use of Barrett
reduction.  I'm still trying to understand if I can drop in some various
Barrett related function calls which already exist in GMP (as I understand
these are the ones with "mu" in the name), but there isn't much
documentation on these particular low level functions.
The crypto handbook shows algorithms for Barrett reduction and
exponentiation independently, but I didn't see a direct example of the two
combined to go off of.

In searching for some examples, I also came across this blog for another
piece of software, where they compare performance with GMP, and benchmark a
few variations of the algorithms:
http://axtls.sourceforge.net/crypto_blog.htm
In the end, after their optimizations, their Barrett implementation was
notably faster than Montgomery for all their tested input sizes: 512-4096
bits. (But still slower than gmp's)

Hans Loeblich