Paul.Zimmermann at inria.fr
Sun May 2 16:38:39 UTC 2021
> IIRC, Svoboda's division trick for N/D is to find a small multiplier m
> such that, for the division mN/(mD) we have mD = 100000...XX... with the
> high limb of mD being 1000...0.
> This idea works also for 2-adic division. Find m = D^(-1) mod \beta
> where \beta is the lomb base. Then do mN/(mD) or mN mod (mD) with
> 2-adic norm. Now mD will end in 0...0001, and the quotient limbs will
> be the low limbs of the intermediate remainder.
> That should mean that we could use mul_basecase directly for
> sbpi1_bdiv_r (or its sibling redc_1).
> Surely this is not a new observation.
yes, see Section 2.4.2 of Modern Computer Arithmetic, where we call it
"Montgomery-Svoboda". The quotient selection becomes trivial, which means
one can reduce the latency between two mpn_addmul_1 calls.
If you reduce k limbs at a time, by precomputing m = D^(-1) mod \beta^k,
then you can use mpn_addmul_k at each step. But to reduce the last k limbs,
you need to revert to classical (Montgomery) division, since mN has n+k limbs.
More information about the gmp-devel