Toom-3.5 (aka Toom-4x3 & Toom-5x2)

Sun Feb 8 10:33:51 CET 2009

Dear developers,

I wrote the code for the unbalanced Toom-3.5, id est:
mpn_toom_interpolate_6pts, mpn_toom43_mul, mpn_toom52_mul.

You can find it on my web page: http://bodrato.it/software/toom.html#TC3.5

I developed and tested it on a Pentium3, but it contains also UNTESTED
code for architectures where HAVE_NATIVE_mpn_rsh1add_n and
HAVE_NATIVE_mpn_addsub_n are defined... If someone can test it on other
processors too, I'll be happy.

Is it useful? If we want to use Toom-4 with unbalanced operands, we have
the 3 flavours 4x4, 5x3 and 6x2; we have to choose among the three
according to the degree of unbalance.

For balanced operands: (e.g. 560x560)
          |______a3______|______a2______|______a1______|______a0______|
|______b3______|______b2______|______b1______|______b0______|
we obviously use 4x4.

For slightly unbalanced: (600x520)
      |_______a3______|_______a2______|_______a1______|_______a0______|
              |__b3___|_______b2______|_______b1______|_______b0______|
4x4 remains the best choice.

Then we reach a limit: (640x480)
  |_______a3_______|_______a2_______|_______a1_______|_______a0_______|
                   |_______b2_______|_______b1_______|_______b0_______|
????

Above that limit: (680x440)
|_a4__|_______a3______|______a2_______|______a1_______|______a0_______|
                      |______b2_______|______b1_______|______b0_______|
we can start using 5x3...

The same happen between 5x3 and 6x2. Using the same example as before, the
case 800x320 is a border case where both 5x3 and 6x2 fail. It is not a
surprise to notice that the two cases of failure of unbalanced Toom-4
perfectly adapt to Toom-3.5, respectively to the 4x3 and to the 5x2
flavour.
This fact is natural: Toom-4 fails for those case because one of the
products nullify, the solution is to fall back on the method with one
interpolation point less, i.e. Toom-3.5.

I conclude that, if we use Toom-4 for unbalanced operands, we _need_
Toom-3.5 for corner cases, and we can use it also to avoid borderline
situations (when we split in (N*n+s)x(M*n+t), the borderline case s+t <= n
imposes irksome coding).
This is general, Toom-2.5 (3x2) should handle borderline situation between
Toom-3 flavours (3x3, 4x2); Toom-3 handle corner cases for Toom-3.5 (4x3.
5x2)... and so on.

Let me know if you test the code.
Marco Bodrato

-- 
http://bodrato.it/