Toom-3.5 (aka Toom-4x3 & Toom-5x2)
bodrato at mail.dm.unipi.it
bodrato at mail.dm.unipi.it
Sun Feb 8 10:33:51 CET 2009
Dear developers,
I wrote the code for the unbalanced Toom-3.5, id est:
mpn_toom_interpolate_6pts, mpn_toom43_mul, mpn_toom52_mul.
You can find it on my web page: http://bodrato.it/software/toom.html#TC3.5
I developed and tested it on a Pentium3, but it contains also UNTESTED
code for architectures where HAVE_NATIVE_mpn_rsh1add_n and
HAVE_NATIVE_mpn_addsub_n are defined... If someone can test it on other
processors too, I'll be happy.
Is it useful? If we want to use Toom-4 with unbalanced operands, we have
the 3 flavours 4x4, 5x3 and 6x2; we have to choose among the three
according to the degree of unbalance.
For balanced operands: (e.g. 560x560)
|______a3______|______a2______|______a1______|______a0______|
|______b3______|______b2______|______b1______|______b0______|
we obviously use 4x4.
For slightly unbalanced: (600x520)
|_______a3______|_______a2______|_______a1______|_______a0______|
|__b3___|_______b2______|_______b1______|_______b0______|
4x4 remains the best choice.
Then we reach a limit: (640x480)
|_______a3_______|_______a2_______|_______a1_______|_______a0_______|
|_______b2_______|_______b1_______|_______b0_______|
????
Above that limit: (680x440)
|_a4__|_______a3______|______a2_______|______a1_______|______a0_______|
|______b2_______|______b1_______|______b0_______|
we can start using 5x3...
The same happen between 5x3 and 6x2. Using the same example as before, the
case 800x320 is a border case where both 5x3 and 6x2 fail. It is not a
surprise to notice that the two cases of failure of unbalanced Toom-4
perfectly adapt to Toom-3.5, respectively to the 4x3 and to the 5x2
flavour.
This fact is natural: Toom-4 fails for those case because one of the
products nullify, the solution is to fall back on the method with one
interpolation point less, i.e. Toom-3.5.
I conclude that, if we use Toom-4 for unbalanced operands, we _need_
Toom-3.5 for corner cases, and we can use it also to avoid borderline
situations (when we split in (N*n+s)x(M*n+t), the borderline case s+t <= n
imposes irksome coding).
This is general, Toom-2.5 (3x2) should handle borderline situation between
Toom-3 flavours (3x3, 4x2); Toom-3 handle corner cases for Toom-3.5 (4x3.
5x2)... and so on.
Let me know if you test the code.
Marco Bodrato
--
http://bodrato.it/
More information about the gmp-devel
mailing list