# Replacement mpn/generic/mul.c

bodrato at mail.dm.unipi.it bodrato at mail.dm.unipi.it
Sun Dec 20 12:03:50 CET 2009

```Dear Torbjorn,

> In the end, I decided for a slightly more complex mul.c, adding a few
> more threshold parameters for unbalanced toom primitives.  The result is
> a bit more complex that desirable.  The speedup of the new code is
> however considerable.

Great work!

I did not read the code carefully, but I have two comments (maybe I'm
missing the point, tell me if this is the case):

1) Should we care possible overflow of the mp_size_t operands? I guess
this is not a problem for 64-bits CPU, but what about 32-bits? (or less??)
We obviously can assume un+vn <= MP_SIZE_T_MAX, but what about the
comparison (un > 8 * vn) ? It may give an incorrect result if e.g.
un=vn=(MP_SIZE_T_MAX+1)/8. Are this operands too big to care about?

2) I suggested the use of an iterative product accumulation also for
operands above the FFT size, but I'm not sure that the use of toom63 for
this huge operands is a good idea. We should probably iterate an
FFT(3*vn,vn), or something like that.

In conclusion, I'd substitute the to line 219:
else if (BELOW_THRESHOLD((un+vn)>>1,MUL_FFT_THRESHOLD) || un>8*vn)
with
else if (BELOW_THRESHOLD((un+vn)>>1,MUL_FFT_THRESHOLD) ||
BELOW_THRESHOLD(2*vn,MUL_FFT_THRESHOLD))

moreover I'd add another iterative block at lines 321-323 like
if (vn < MP_SIZE_T_MAX/8 && un >= 8*vn) {

} else

To solve the possible overflows of issue 1) one should also change the (2
* un >= 5 * vn) comparison in block ToomX3 with ( un > (5 * vn)>>1),
because we know vn is small, but un can be big; and change also the
TOOM44_OK macro

One more random idea: maybe in the ToomX2 block you can _SDECL, _SMARK,
_SALLOC...

--
http://bodrato.it/

```