Small-base modexp
Marco Bodrato
bodrato at mail.dm.unipi.it
Wed Oct 2 23:40:46 UTC 2019
Ciao,
Il Mer, 2 Ottobre 2019 3:29 pm, Torbjörn Granlund ha scritto:
> "Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
> I wrote the doubling-modulo step with 3 possible operations:
> if it can not overflow, simply lshift;
> otherwise choose between (x*2-m) and ((x-m)*2)
>
> Do you choose that at each step? That might be a cause for delay due to
> branch miss.
Yes, at each step.
I implemented it with two variants of that portion:
#ifdef BRANCHLESS
int cond1 = (r0 >= m0);
int cond2 = (r0 > (m0 >> 1));
r0 -= m0 & - (mp_limb_t) cond1;
r0 <<= 1;
r0 -= m0 & - (mp_limb_t) (cond1 ^ cond2);
#else
if (r0 <= (m0 >> 1))
r0 <<= 1;
else if (r0 >= m0)
r0 = (r0 - m0) << 1;
else
r0 = (r0 << 1) - m0;
#endif
I did not measure them very carefully, but the BRANCHLESS variant seems
slower.
> Let me make sure I understand what you're comparing.
>
> Are you comparing mpz_powm with base = 2 with your mew special code for
> base = 2 for size = 1?
The current mpz_powm code has (basically) the same speed for any base. 2,
3, 0xffffffffff or whatever.
I added a special function, that is called by mpz_powm when the base is 2.
When size=1 this function uses an ad-hoc loop directly operating on limbs,
for the other sizes the special function uses functions from the mpn_
layer.
> overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU
> freq 3500.08 MHz
> mpz_powm mpz_powm.2 mpz_powm.3
> 1 0.000004322 #0.8082 0.9884
> 2 0.000007201 1.0723 #0.9836
> 3 0.000013274 #0.9465 0.9928
> 56 0.001738323 #0.7941 1.0011
The current code, with a small base (eg. 3) is basically as fast as with a
random base (maybe 1% difference, maybe not).
For size=1, my specialised function with base=2 is 20% faster.
For small sizes (eg.3), my specialised function (base=2) is slightly
faster (5%).
For larger sizes (eg.56), my specialised function (base=2) is sensibly
faster (20%).
For size=2, my specialised function (base=2) is slower, at least on some
architectures.
> I'd expect to see significant speedup for slower functions like gcd,
> powm and jacobi from specialisation.
powm may have different specialisations.
With respect to size (if we write a specialised powm function for a
generic base and a modulus of size 2 limbs) then my function may loose
even harder for that size...
> Checking for particular operand values in general adds more overhead
> than is possible to get back. For for slower functions it is worth it.
With the new composite test, a specialisation for base=2 makes sense...
Ĝis,
m
--
http://bodrato.it/papers/
More information about the gmp-devel
mailing list