Replacement mpn/generic/mul.c
Torbjorn Granlund
tg at gmplib.org
Sun Dec 20 12:42:44 CET 2009
bodrato at mail.dm.unipi.it writes:
I did not read the code carefully, but I have two comments (maybe I'm
missing the point, tell me if this is the case):
1) Should we care possible overflow of the mp_size_t operands? I guess
this is not a problem for 64-bits CPU, but what about 32-bits? (or less??)
We obviously can assume un+vn <= MP_SIZE_T_MAX, but what about the
comparison (un > 8 * vn) ? It may give an incorrect result if e.g.
un=vn=(MP_SIZE_T_MAX+1)/8. Are this operands too big to care about?
We promise 2^32-1 *bits* as the largest operands. As long as we don't
multiply the largest operands' sizes by something >32, we should be
fine.
2) I suggested the use of an iterative product accumulation also for
operands above the FFT size, but I'm not sure that the use of toom63 for
this huge operands is a good idea. We should probably iterate an
FFT(3*vn,vn), or something like that.
In conclusion, I'd substitute the to line 219:
else if (BELOW_THRESHOLD((un+vn)>>1,MUL_FFT_THRESHOLD) || un>8*vn)
with
else if (BELOW_THRESHOLD((un+vn)>>1,MUL_FFT_THRESHOLD) ||
BELOW_THRESHOLD(2*vn,MUL_FFT_THRESHOLD))
moreover I'd add another iterative block at lines 321-323 like
if (vn < MP_SIZE_T_MAX/8 && un >= 8*vn) {
We multiply m x n, where m >> n.
Using FFT(3n,n):
Take 3n x n pieces, roughly m/(3n) times.
Each 3n x n piece needs time O(M(4n)).
Totally O(M(4n) m/(3n))
Using toom63:
Take 2n x n pieces, roughly m/(2n) times.
Each 2n x n piece needs time O(8 M(n/3)).
Totally O(M(n/3) 4m/n)
The function M(n) is superlinear, so:
M(n/3) 4m/n < M(4n/3) m/n < M(4n) m/(3n)
So it seems toom63 is better. Am I missing something?
One more random idea: maybe in the ToomX2 block you can _SDECL, _SMARK,
_SALLOC...
Good idea!
--
Torbjörn
More information about the gmp-devel
mailing list