Replacement mpn/generic/mul.c

Torbjorn Granlund tg at gmplib.org
Sun Dec 20 12:42:44 CET 2009

```bodrato at mail.dm.unipi.it writes:

I did not read the code carefully, but I have two comments (maybe I'm
missing the point, tell me if this is the case):

1) Should we care possible overflow of the mp_size_t operands? I guess
this is not a problem for 64-bits CPU, but what about 32-bits? (or less??)
We obviously can assume un+vn <= MP_SIZE_T_MAX, but what about the
comparison (un > 8 * vn) ? It may give an incorrect result if e.g.
un=vn=(MP_SIZE_T_MAX+1)/8. Are this operands too big to care about?

We promise 2^32-1 *bits* as the largest operands.  As long as we don't
multiply the largest operands' sizes by something >32, we should be
fine.

2) I suggested the use of an iterative product accumulation also for
operands above the FFT size, but I'm not sure that the use of toom63 for
this huge operands is a good idea. We should probably iterate an
FFT(3*vn,vn), or something like that.

In conclusion, I'd substitute the to line 219:
else if (BELOW_THRESHOLD((un+vn)>>1,MUL_FFT_THRESHOLD) || un>8*vn)
with
else if (BELOW_THRESHOLD((un+vn)>>1,MUL_FFT_THRESHOLD) ||
BELOW_THRESHOLD(2*vn,MUL_FFT_THRESHOLD))

moreover I'd add another iterative block at lines 321-323 like
if (vn < MP_SIZE_T_MAX/8 && un >= 8*vn) {

We multiply m x n, where m >> n.

Using FFT(3n,n):
Take 3n x n pieces, roughly m/(3n) times.
Each 3n x n piece needs time O(M(4n)).
Totally O(M(4n) m/(3n))

Using toom63:
Take 2n x n pieces, roughly m/(2n) times.
Each 2n x n piece needs time O(8 M(n/3)).
Totally O(M(n/3) 4m/n)

The function M(n) is superlinear, so:
M(n/3) 4m/n < M(4n/3) m/n < M(4n) m/(3n)

So it seems toom63 is better.  Am I missing something?

One more random idea: maybe in the ToomX2 block you can _SDECL, _SMARK,
_SALLOC...

Good idea!

--
Torbjörn
```