Codelets for ToomN1 (for N=2, 3, 4, 6, 8) should be added and here's why. (Also: a significant non-triviality on where cut-off points should be).
paul zimmermann
Paul.Zimmermann at inria.fr
Wed Apr 4 07:52:27 UTC 2018
Dear Marco,
I'm not sure either to understand what Rock means.
One possible interpretation is the following: the simple threshold mechanism
used by GMP might be suboptimal.
Consider for example MUL_TOOM22_THRESHOLD=10 on my computer (with make tune).
We know the ratio mpn_toom22_mul/mpn_mul_basecase is not monotonic. It might
be that mpn_toom22_mul is faster for n=8, and mpn_mul_basecase is faster for
n=15. What Rock suggests (if I understand correctly) is to replace the
threshold mechanism by some finer scheme, for example in toom22_mul.c:
#define TOOM22_MUL_N_REC(p, a, b, n, ws) \
do { \
if (best_mul[n] == MUL_BASECASE) \
mpn_mul_basecase (p, a, n, b, n); \
else \
mpn_toom22_mul (p, a, n, b, n, ws); \
} while (0)
where best_mul[] would be a table saying which is the best method to use.
We already use such a scheme in MPFR for the short multiplication, where the
table gives the best cutoff k in Mulders' algorithm for a given size n.
Since MUL_TOOM8H_THRESHOLD=214 on my computer, you don't need to have a large
table to cover all Toom variants.
However, for the unbalanced multiplication, one would need a bi-dimensional
table, which would be more expensive to store.
Paul
More information about the gmp-discuss
mailing list