2-adic Svoboda

Mon May 3 06:31:46 UTC 2021

       Dear Torbjörn,

>   yes, see Section 2.4.2 of Modern Computer Arithmetic, where we call it
>   "Montgomery-Svoboda". The quotient selection becomes trivial, which means
>   one can reduce the latency between two mpn_addmul_1 calls.
> 
> It really becomes a mul_basecase except that the first round there is
> done with mul_1.  If we replace that mul_1 by addmul_1, and
> then called the resulting function like this,
> 
> mul_basecase'(rp, up, un, rp, rn-un)
> 
> then we will get the desired remainder at mul_basecase speed.

a slight difference is that here the limb multiplier is not known in advance,
but only at the beginning of each loop. But maybe it does not make any
difference.

I tried to implement Montgomery-Svoboda at the C level, but did not manage
to beat the mpn_redc_x routines. I'm very interested to see your results!

Best regards,
Paul