Codelets for ToomN1 (for N=2, 3, 4, 6, 8) should be added and here's why. (Also: a significant non-triviality on where cut-off points should be).

Wed Apr 4 06:23:56 UTC 2018

Ciao,

First of all, I'm not sure I can consider this a bug report; therefor I
reply on the gmp-discuss list.

Il Mar, 27 Marzo 2018 1:22 am, Rock Brentwood ha scritto:
> It may be possible to eliminate the need for a "basecase" (except for 1x1
> limb multiplication, of course) by including codelets for Toom21, Toom31,
> Toom41, Toom61, Toom81, etc.

What do you mean with ToomN1?

In my understanding, Toom means:
 - looking at operands as polynomials,
 - evaluate in a given set of points,
 - operate (multiply) on evaluations,
 - interpolate to obtain the result.

Let's consider Toom-6x1.
We have Toom-4x3, 6 multiplications and some linear algebra, instead of
4*3=12 multiplications.
We have Toom-5x2, 6 multiplications and similar linear algebra, instead of
5*2=10 multiplications. The more the operands are unbalanced, the less
Toom is an improvement.
What about Toom-6x1? Why should we add complex linear algebra, if we can
directly multiply the 6 slices of the longer operand by the single part of
the shorter one?

Why do you skip N=5 and N=7?

> In many cases, the recursive decompositions
> done through these codelets will end up falling *back* into the other
> codelets!

Also here, I'm not sure I understand. Assume Toom-4x1 recursion falls into
the Toom-2x2 range... isn't it better to directly use Toom-4x2 twice? The
latter is the strategy used by current GMP. Look e.g. at the code in
mpn/generic/mul.c, the lines around the following code:

  while (un >= 3 * vn)
    {
      mpn_toom42_mul (ws, up, 2 * vn, vp, vn, scratch);
      un -= 2 * vn;
      up += 2 * vn;
      ....

Another option can be exploiting the invariance of the short operand when
slices of the longer one are handled; the idea is older, but about that
subject I can suggest the paper by Alberto Zanoni "Iterative Toom-Cook
methods for very unbalanced long integer multiplication".

> I included an ASCII picture showing the cut-off points for multiplication
> that uses Toom11 (O) Toom21 (.) and Toom22 (o). Hopefully the mailer will
> not cut off the diagram.
>
> In some places (:), a balance point between Toom21 and Toom22 is reached.

I cut a 10x10 portion.

- 1234567890
1 O.........
2 .o........
3 ..oo..:...
4 ..oo:.....
5 ...:oooo..
6 ....oooo:.
7 ..:.ooooo.
8 ....ooooo:
9 .....:oooo
0 .......:oo

How do you apply Toom-2x2 in the 7x3 case? Where one operand is more than
twice as long as the other one?

Did you derive your picture from some implementation or it is only
theoretical? In the latter case, what does your theory count? And when do
you consider that an algorithm is better than another one?

Ĝis,
m

-- 
http://bodrato.it/software/toom.html