Torbjorn Granlund tg at gmplib.org
Tue Dec 8 08:11:07 CET 2009

```nisse at lysator.liu.se (Niels Möller) writes:

Torbjorn Granlund <tg at gmplib.org> writes:

> The wrapping multiplication for mpn_invert and mpn_binvert are of the
> kind 2n x n.  Using mulmod_bnm1 for that is not that great.

But not so bad, either, is it?

I suppose we could write a special mulmod_bnm1 for the degenerate case
where 2bn < rn.

The multiplication is 2n (input) x n (previous iterate). Product is of
size 3n, where n limbs at the most significant end (for mpn_invert) are
known a priori to be all ones (or, since we don't quite converge from
below due to truncation at this step in the algorithm, they can be
either B^n or B^n-1, so one needs an extra limb to figure out which one
it is). Hence one can use mulmod_bnm1 with rn = 2n + 1, an = 2n, bn = n.

I'd suggest to use also an=2n+1 (whether we use some mulmod or plain
mul).

In the FFT range, wraparound arithmetic should save roughly 30% of the
work compared to a full multiplication, but it's not clear to me what
saving one can expect for smaller sizes.

Here "FFT range" refers to something like 1.5n < MUL_FFT_MODF_THRESHOLD,
I suppose.  MUL_FFT_MODF_THRESHOLD is vastly smaller than
MUL_FFT_THRESHOLD.  http://gmplib.org/devel/MUL_FFT_MODF_THRESHOLD.html

--
Torbjörn
```