Allocation for toom33

Sun Oct 25 20:08:11 CET 2009

Torbjorn Granlund <tg at gmplib.org> writes:

> nisse at lysator.liu.se (Niels Möller) writes:
>
>   One other observation: They all have large s and small t. Is there any
>   input sizes where it's optimal to choose a toom varaiant that results
>   in a small s?
>   
> For toom32 I get these points with small s:
>
>  an  bn    s  t   n
>  26  22:   4  11  11
>  27  24:   3  12  12
>  29  26:   3  13  13
> 100  95:   4  47  48
> 100  96:   4  48  48

Interesting. For these cases, using toom33 instead would give a
smaller n. E.g., consider the last one, 100x96.

With toom32, it's three 48x48 and one 48x4.

With toom33, we would get n = 34, s = 32, t = 28, hence four 34x34 and
one 32x28.

With schoolbook, it's 7104 for toom32 and 5520 for toom33, which looks
like a pretty large difference to me. If we take into account that
Karatsuba is used for the recursive calls, I get 5376 for toom32 and
4172 for toom33. So more than 1000 additional limb multiplications for
toom32.

So if toom32 wins despite this, it must be due to the additional
linear cost of toom33. Interpolation alone is some 14 or so linear operations
on 69-limb numbers for toom33 (including one divby3), and 7 or so for
toom32, working on 96-limb numbers, and evaluation is slightly more
expensive too.

Anyway, it looks like we can't rule out small s, that case seem to be
important for performance at least for sizes up to a few hundred
limbs. It would be interesting to know the next best algorithm, and
margin, for the small s cases above.

Regards,
/Niels