# Small-base modexp

Niels Möller nisse at lysator.liu.se
Thu Oct 3 06:39:25 UTC 2019

```"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:

> I implemented it with two variants of that portion:
>
> #ifdef BRANCHLESS
>               int cond1 = (r0 >= m0);
>               int cond2 = (r0 > (m0 >> 1));
>
>               r0 -= m0 & - (mp_limb_t) cond1;
>               r0 <<= 1;
>               r0 -= m0 & - (mp_limb_t) (cond1 ^ cond2);
> #else
>               if (r0 <= (m0 >> 1))
>                 r0 <<= 1;
>               else if (r0 >= m0)
>                 r0 = (r0 - m0) << 1;
>               else
>                 r0 = (r0 << 1) - m0;
> #endif

What's the range of r0? To me it looks like the last two cases could
overflow if r0 is much larger than m0, so I take it there's some limit?

If r0 <= m0, one could the modular-add-via-sub trick, which reduces the
number of conditions from two to one:

neg_r0 = m0 - r0;
mask = - (mp_limb_t) (neg_r0 > r0);
r0 = r0 - neg_r0 + (mask & m0);

which in the doubling case could be simplified to

mask = - (mp_limb_t) (r0 >= m0 - r0); /* 2 r0 >= m0 */
r0 = (r0 << 1) - (mask & m0);

Output is in the range r0 <= m0, with equality only if we had r0 = m0 as
input.

If you use a less canonical representation to speed up the modular
squarings, I see two cases: If 2*m0 fits in a limb, and you can make
squaring work with values in the range 0 <= x <= 2*m0, then you can do
doubling mod 2*m0 in the same way (this is related to David Harvey's
trick in the context of small-prime fft).

But if m0 has the most significant bit set, one would need some other
trick. If you don't find something clever enough, the easiest way to
organize it may be to work with canonical representation, and
canonicalize the output from the modular squaring. Then the above
doubling should work fine.

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
```