mpn_mulmod_bnm1
Torbjorn Granlund
tg at gmplib.org
Wed Apr 2 17:33:37 UTC 2014
nisse at lysator.liu.se (Niels Möller) writes:
I don't remember any fundamental difficulties.
Except the 2^m-1 one.
* Tweaking the roots to compute mod 2^m+1 rather than 2^m-1 (since 2^m-1
is better handled by splitting in mulmod_bnm1).
Perhaps having both variants would be useful, a CRT of 2^m+1 and 2^m-1
is particularly easy, and would perhaps result in faster general
multiply than mpn_mulmod_bnm1 approach of roughly 2^m+1, 2^{m/2}+1,
2^{m/4}+1, ... But if the cost is duplicated tables of roots-of-unity,
then it might not be worth it.
* Implementing David Harvey's trick: slightly smaller primes, but
butterflies sped up from 9-10 or so cycles down to 6 (this was
benchmarked on Opteron, where 6 is the limit from multiplier
throughput).
That indeed seems like something that ought to be done already in
version 1 of the checkin...
* And more assembly coding.
I'll be happy to contribute here. :-)
> To lower the bar for inclusion, we could keep it disabled by default,
> and enable it only if some encouraging parameter is set in gmp-mparam.h.
Or an explicit ./configure time option.
That too.
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list