Reason for definition of precomputed reciprocals
Niels Möller
nisse at lysator.liu.se
Thu Jun 25 16:41:50 CEST 2026
Albin Ahlbäck <ahlback at lix.polytechnique.fr> writes:
> One has
>
> (B + 1 + v) d = B^2 - k,
>
> where 0 <= k < d. For division, one would have
>
> q = floor(n (v + B + 1) / B^2),
> r = m - q d,
>
> with 0 <= r < 2 d, so only if-switch would be required.
But then it's possible that r >= B, so the needed test for r >= d is not
a simple single-word comparison. The main point of our algorithms for
2/1 and 3/2 division is to only have to compute the low half of q d.
This avoids a cycle of carry propagation latency when computing the
candidate r, as well as the cost of computing the high half of the q * d
product. And instead, the "q_0" fraction word is needed to sort things
out.
That's a tradeoff that seems pretty good in the GMP setting. For other
settings, there are likely better ways. E.g., I've recently been looking
into "post-quantum" lattice-based algorithms, and need modulo for some
fixed small d's. Then I settled on using the reciprocal
v = ceil (2^32 / d)
and I can then compute x mod d (where x is 32 bits and d is 16 bits)
using something like
uint32_t q = ((uint64_t) v * x) >> 32
uint32_t r = u - q*d; /* Interpreted as two's complement, |r| < d */
r += (r >> 16) & d;
Regards,
/Niels
--
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list