Torbjorn Granlund tg at
Wed Apr 2 17:33:37 UTC 2014

nisse at (Niels Möller) writes:

  I don't remember any fundamental difficulties.
Except the 2^m-1 one.

  * Tweaking the roots to compute mod 2^m+1 rather than 2^m-1 (since 2^m-1
    is better handled by splitting in mulmod_bnm1).
Perhaps having both variants would be useful, a CRT of 2^m+1 and 2^m-1
is particularly easy, and would perhaps result in faster general
multiply than mpn_mulmod_bnm1 approach of roughly 2^m+1, 2^{m/2}+1,
2^{m/4}+1, ...  But if the cost is duplicated tables of roots-of-unity,
then it might not be worth it.

  * Implementing David Harvey's trick: slightly smaller primes, but
    butterflies sped up from 9-10 or so cycles down to 6 (this was
    benchmarked on Opteron, where 6 is the limit from multiplier
That indeed seems like something that ought to be done already in
version 1 of the checkin...

  * And more assembly coding.
I'll be happy to contribute here.  :-)

  > To lower the bar for inclusion, we could keep it disabled by default,
  > and enable it only if some encouraging parameter is set in gmp-mparam.h.
  Or an explicit ./configure time option.

That too.  

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list