Wraparound multiplicaton vs mullo
tg at gmplib.org
Sat Oct 3 21:41:09 CEST 2009
I have a few comments about this.
This is very cool! I think this will beat Mulder for mullo as soon as
operands get beyond the basecase range. This is going to be useful for
Newton and cancelling operations such as those used for gcd.
But it might be more tricky for the Schönhage-Strassen FFT (SSFFT)
dyadic product. It wants to compute mod x^n+1, not x^n-1 which is
redicible over Q. The SSFFT basecase could however be implemented using
a conventional FFT over C, where x^n+1 is of course reducible. Pierrick
Gaudry and myself started such a project a few years back, but we never
got further than to some inner loops and a few optimistic performance
estimates. (One should realize in this context that arithmetic over Q
approximated with 64-bit hardware floats is relatively more powerful for
32-bit machines than for 64-bit machines.)
The way I deal with branch prediction problems for triangular operations
is zealous unrolling. I work from the wide part towards the tip. I
also implement completely unrrolled code for the tip, since there one
can avoid many branch mispredictions at the cost of limited extra code.
More information about the gmp-devel