# Codelets for ToomN1 (for N=2, 3, 4, 6, 8) should be added and here's why. (Also: a significant non-triviality on where cut-off points should be).

paul zimmermann Paul.Zimmermann at inria.fr
Wed Apr 4 07:52:27 UTC 2018

```       Dear Marco,

I'm not sure either to understand what Rock means.

One possible interpretation is the following: the simple threshold mechanism
used by GMP might be suboptimal.

Consider for example MUL_TOOM22_THRESHOLD=10 on my computer (with make tune).
We know the ratio mpn_toom22_mul/mpn_mul_basecase is not monotonic. It might
be that mpn_toom22_mul is faster for n=8, and mpn_mul_basecase is faster for
n=15. What Rock suggests (if I understand correctly) is to replace the
threshold mechanism by some finer scheme, for example in toom22_mul.c:

#define TOOM22_MUL_N_REC(p, a, b, n, ws)				\
do {									\
if (best_mul[n] == MUL_BASECASE)					\
mpn_mul_basecase (p, a, n, b, n);					\
else								\
mpn_toom22_mul (p, a, n, b, n, ws);				\
} while (0)

where best_mul[] would be a table saying which is the best method to use.

We already use such a scheme in MPFR for the short multiplication, where the
table gives the best cutoff k in Mulders' algorithm for a given size n.

Since MUL_TOOM8H_THRESHOLD=214 on my computer, you don't need to have a large
table to cover all Toom variants.

However, for the unbalanced multiplication, one would need a bi-dimensional
table, which would be more expensive to store.

Paul
```