Toom-Cook 4x3 and 5x2 multiplication: Toom3.5

Marco Bodrato bodrato at mail.dm.unipi.it
Thu Nov 23 10:28:21 CET 2006


Dear Paul,

On Thu, 23 Nov 2006 09:20:29 +0100, Paul Zimmermann wrote
> > Here are some timings: [...]
> 
> These timings are great!

On the web page you will also find graphs.

> > - Toom3.5 will be faster than Toom4, for any size, and this is obvious...
> 
> Sorry it is not obvious to me (and maybe to other readers of gmp-devel).
> Please can you elaborate?

Shortly. Toom4 now implements only 4x4. Toom3.5 may be used 4x3 or 5x2.
If you compare 4x4 to 4x3 you will have exactly the same block sizes. In Toom4
we have one more multiplication (which nullify, and can be avoided) and much
more overhead (sums, divisions...). Comparing 5x2 to 4x3, you have the
same number of mul, sum, shift, div, but in 5x2 you have smaller operands.
So, if they are all aplicable to some chosen operands, we have:
time(5x2) < time(4x3) < time(4x4).
Finaly, unbalanced Toom3.5, where applicable, will be faster than balanced Toom4.

> Isn't it the contrary? mpz_mul does manage different sizes, and I guess you
> perform padding in mpz_tc3.

Sorry it is not obvious to me :-) I do not know, I stealed your mpz_tc4 code :-)
I guess you are right.

Regards,
Marco

--
http://bodrato.it/



More information about the gmp-devel mailing list