fixed-size mpn_mul_n for small n?

Sun Feb 12 20:06:20 CET 2012

Zimmermann Paul <Paul.Zimmermann at loria.fr> writes:

> I am trying to optimize the modular multiplications and squarings in GMP-ECM
> (where we use Montgomery's reduction).

For these moderate sizes, does it pay off to precompute a full inverse
for the montgomery reduction, rather than using redc_1 or redc_2? (I
don't quite understand which lines in you benchmark data I should look
at).

Have you tried the "bidirectional" trick we discussed a while ago (IIRC
it was your idea): For size n, instead of the standard montgomery
representation x' = B^n x (mod m), use the representation x' = B^{n/2} x (mod
m), and each time a size 2n product is to be reduced, cancel n/2
limbs from the right, and n/2 from the left? For the _1 version, it
might even make sense to make a single loop working from both ends.

And back to the original question: I guess you could try to completely
unroll the multiplication for some size of interest, and compare to the
general mpn_mul_basecase.

My understanding is that current branch predictors are fairly good at
the case of a loop which always runs for the same number of iterations,
so the loop overhead (even without unrolling) shouldn't be severe.
Totally unrolled code might have a greater potential for speedups for
squaring, mullo and mulhi, than for mul_basecase, since the former
probably are less friendly to the branch prediction.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.