Possible to optimize for base 2 fermat primality test using shifts?

Fri Apr 12 02:44:25 UTC 2019

Hello,
I've been learning to use GMP library lately and doing some experimenting
with searching for various specific types of prime numbers.
I am currently interested in any ways of increasing throughput over plainly
calling mpz_probab_prime_p.
One technique I've been using to quickly filter out composites and sort of
balance load between pieces of code is to pipe data between various small
utilities using mpz_out_raw/mpz_inp_raw , each of which filter
progressively more in stages, beginning with some trial division, then
moving to a fermat test, etc.  If an input doesn't pass test in one
utility, then it omits it from outputting further down the pipeline.

I noticed the mpz_probab_prime_p does a fermat primality test of base 210
before moving on to miller-rabin, with the following code:
/* Perform a Fermat test.  */
  mpz_set_ui (x, 210L);
  mpz_powm (y, x, nm1, n);
  if (mpz_cmp_ui (y, 1L) != 0)
    {
      TMP_FREE;
      return 0;
    }

Is base 210 significantly stronger than a base 2 test?
I'm wondering because it seems like a specialized function for base 2 might
be more optimizable to filter out weak candidates faster than generalized
mpz_powm used here.
I guess what I'm proposing is a new function signature such as:

void mpz_powm_2exp(mpz_t rop, const mpz_t exp, const mpz_t mod)
Set rop to (2 raised to exp) modulo mod.

I've only just read a little bit about how k-ary window exponentiation
works, the math behind it is still pretty new and difficult to me.  So, I'm
not really clear if its directly applicable to drop in lshifts as
replacements for multiplications.  But it seems that with k <=
log2(bits_per_limb), e.g. 6 for a 64bit machine,  that mpn_lshift might be
applicable.  And for higher values, I think the additional bits could be
still be applied by calling mpn_lshift and offsetting the destination limb?

One last thing comment I have is regarding the code for setting k, which
looks up the bitcount in this array:
static mp_bitcnt_t x[] =
{0,7,25,81,241,673,1793,4609,11521,28161,~(mp_bitcnt_t)0};

However, I noticed that wikipedia cites an equation for optimal setting of
k, which results in different values than these:
https://en.wikipedia.org/wiki/Exponentiation_by_squaring#2k-ary_method
>From citation source:
"For optimal efficiency, k should be equal to the smallest integer
satisfying lg(n) <= k*(k+1)*2^(2k) / (2^(k+1)-k-2))"
Cohen, H.; Frey, G., eds. (2006). *Handbook of Elliptic and Hyperelliptic
Curve Cryptography*

I calculated the right hand side for each k up to 53 (k=54 would exceed
64bit unsigned limits), and these are the values I ended up with:
static mp_bitcnt_t x[] =
{0,8UL,24UL,70UL,197UL,539UL,1434UL,3715UL,9400UL,23291UL,56652UL,135599UL,320035UL,746156UL,1721161UL,3933181UL,8914121UL,20055470UL,44828335UL,99616716UL,220203271UL,484444769UL,1061161949UL,2315259259UL,5033168701UL,10905194788UL,23555216179UL,50734306666UL,108984801227UL,233538853463UL,499289955601UL,1065151897593UL,2267742741265UL,4818953315930UL,10222022175191UL,21646635183496UL,45767171518831UL,96619584304525UL,203684529060325UL,428809534848631UL,901599534793541UL,1893359023048784UL,3971435999546779UL,8321103999030054UL,17416264183994611UL,36415825111965443UL,76068612456080729UL,158751886864837621UL,331014572611760857UL,689613692941138438UL,1435522381224378751UL,2985886552946673988UL,6205960286516580695UL,12889302133534398905UL,~(mp_bitcnt_t)0};

Also I used ceil on the result, not sure if that's optimal vs floor or
rounding, but you could also argue that it doesn't make 1 bit of difference
;p
So anyways, I don't know if the rest of the code actually would even work
with k values beyond the existing list, but I thought I would mention that
these might be more optimal.

Any thoughts, comments on this?

-Hans Loeblich