bipartite modular multiplication
Paul.Zimmermann at loria.fr
Wed Apr 29 10:39:43 CEST 2009
> You replace a single 2n by n division by two n by n/2 divisions. I
> imagine that should give some speedup in most size ranges? In the
> schoolbok-range (not sure what typical thresholds are) it makes some
> sense to me mix the two loops in a single assembly function. When
> getting into the divide-and-conquer range, that seems harder (and
> I think threading at this level is not going to get into gmp).
my question was mainly for the schoolbok range.
> For the important application of modular exponentiation, one ought to
> precompute an classical n/2 inverse of the high half of N and a n/2
> modular inverse of the low part.
this is for the divide-and-conquer range. In the schoolbok-range, you only need
a one-limb inverse, or you can use Svoboda's algorithm to reduce n/2-1 limbs.
> The other REDC application I'm aware of is small-prime fft; I wonder
> if this trick could make it useful to do small-prime fft with primes
> that are two limbs? Given that the low limb of a suitable prime should
> be 1, the redc-part should be trivial. If I get this right, for a
> segment of the FFT that processes four limbs, it's four 1x1->2
> products and four 2->1 redc vs one 2x2->4 multiplications (also four
> 1x1->2) and a single 4->2 reduction as above.
I was not aware of that application. Nice. Another application is of course
mpz_powm (i.e., RSA encryption or decryption), which might speed up gmp-bench.
More information about the gmp-devel