Toom-Cook 4x3 and 5x2 multiplication: Toom3.5

Thu Nov 23 09:20:29 CET 2006

       Marco,

> Toom3.5 is faster than mpz_mul from 120x90 product, and up to 4500x3375.
> Saving up to 35%. Then, Toom7 is still 5% faster than FFT, up to 6500x4875.
> This Toom3.5 implementation is that fast also because it recurses on Toom7
> for balanced sub-multiplications.
> 
> Here are some timings: [...]

These timings are great!

> Notes:
> - Toom3.5 will be faster than Toom4, for any size, and this is obvious...

Sorry it is not obvious to me (and maybe to other readers of gmp-devel).
Please can you elaborate?

> - mpz_tc3 is sometimes (in the 1600-3600 range) faster than mpz_mul.
> This may indicate that padding can give worst result than management of
> different sizes.

Isn't it the contrary? mpz_mul does manage different sizes, and I guess you
perform padding in mpz_tc3.

> PS: after a good sleep... I changed the code again, now mpz_tc3_5 performs
> all we called Toom3.5 in the paper. I mean, it implements both 4x3 and 5x2
> computation, chosing at run-time. For the recursive multiply A(Inf)*B(Inf)
> it recurse to itself, because the initial factors could be not exactly
> 5x2 or 4x3, so this product can be unbalanced again!

Yes, this is an interesting open question. Toom2.5 for example is optimal
for operand sizes of un=3n and vn=2n, whereas Toom3 is optimal for 3n and 3n.
In which interval for un/vn should we prefer Toom2.5 to Toom3?

Another question you mention here is the following: in the mpn_mul() function,
which handles operands of different sizes, should we pad operands before
calling the Toomxxx functions, or should those functions deal with operands
sizes that are not exactly in the corresponding ratio? Consider for example
Toom2.5: my code only works for un=3n and vn=2n, i.e. un/vn=3/2. We could
extend it to 1 < un/vn < 3, by calling recursively mpn_mul() instead of
mpn_mul_n() for the product of the largest terms, that you call A(Inf)*B(Inf).

Paul