Replacement mpn/generic/mul.c

Sun Dec 20 12:42:44 CET 2009

bodrato at mail.dm.unipi.it writes:

  I did not read the code carefully, but I have two comments (maybe I'm
  missing the point, tell me if this is the case):

  1) Should we care possible overflow of the mp_size_t operands? I guess
  this is not a problem for 64-bits CPU, but what about 32-bits? (or less??)
  We obviously can assume un+vn <= MP_SIZE_T_MAX, but what about the
  comparison (un > 8 * vn) ? It may give an incorrect result if e.g.
  un=vn=(MP_SIZE_T_MAX+1)/8. Are this operands too big to care about?

We promise 2^32-1 *bits* as the largest operands.  As long as we don't
multiply the largest operands' sizes by something >32, we should be
fine.

  2) I suggested the use of an iterative product accumulation also for
  operands above the FFT size, but I'm not sure that the use of toom63 for
  this huge operands is a good idea. We should probably iterate an
  FFT(3*vn,vn), or something like that.

  In conclusion, I'd substitute the to line 219:
    else if (BELOW_THRESHOLD((un+vn)>>1,MUL_FFT_THRESHOLD) || un>8*vn)
  with
    else if (BELOW_THRESHOLD((un+vn)>>1,MUL_FFT_THRESHOLD) ||
             BELOW_THRESHOLD(2*vn,MUL_FFT_THRESHOLD))

  moreover I'd add another iterative block at lines 321-323 like
   if (vn < MP_SIZE_T_MAX/8 && un >= 8*vn) {

We multiply m x n, where m >> n.

Using FFT(3n,n):
Take 3n x n pieces, roughly m/(3n) times.
Each 3n x n piece needs time O(M(4n)).
Totally O(M(4n) m/(3n))

Using toom63:
Take 2n x n pieces, roughly m/(2n) times.
Each 2n x n piece needs time O(8 M(n/3)).
Totally O(M(n/3) 4m/n)

The function M(n) is superlinear, so:
M(n/3) 4m/n < M(4n/3) m/n < M(4n) m/(3n)

So it seems toom63 is better.  Am I missing something?

  One more random idea: maybe in the ToomX2 block you can _SDECL, _SMARK,
  _SALLOC...

Good idea!

-- 
Torbjörn