Toom-8 testing (mpz level)

Wed Oct 7 14:23:39 CEST 2009

>   Dear developers,
>                               I implemented a mpz version of Toom-8
> algorithm for multiplication. On my architecture I obtained someway
> interesting results. You can find two graphics, for multiplication and
> squaring, at
>
>   http://bodrato.it/papers/zanoni.html#CIVV2009b
>
> I think you want to say "Generally speaking, only the degree 1
> (Karatsuba), 2 and 3 ...", if you're speaking of polynomial degree
> (which you seem to be doing since toom8 is said to use degree 7).

Hi, 
       actually I was wrong with the link, sorry

http://bodrato.it/papers/zanoni.html#SYNASC2009

> Why is the code slower in the 7000-9000 limb interval?  I suspect it
> might be due to improper MUL_*_THRESHOLD values for the recursive
> products.  Did you "make tune" on your machine and recompile with the
> generated results?

Yes, I did it. It's slower because FFT enters the game at 7168. The graphic 
shows that , at least in my case, it is faster than Toom-8 in 7500-9000 limb 
interval (for squares, the threshold for FFT is 3840).

>   (under "Toom-8 way for long integers multiplication" - thanks Marco !)
>
>   As I'm waiting for the publication of the proceedings of the SYNASC 2009
>   conference, where the related paper was presented, currently I cannot put
> the paper itself, with all the details, on the web, but actually Marco is
> working quite ahead of it, and will surely produce something better at mpn
> level, with many new ideas, so I think that the paper is actually already
> obsolete.
>
> I suppose that's a common problem.  I hope you can put an up-to-date
> version on the web at some point?

I hope so, too, but probably Marco can do it better.

>   Just to point out that Toom-Cook-k methods with k as high as 8 (may be
> even more ?) can be effective and worth implementing.
>
> While I very much appreciate your work, I am not sure I am happy about
> its outcome: many toom functions will be needed for optimal performance.
>
> :-)

You're perfectly right: quite a lot of effort still waiting.

> Lots of unbalanced toomMN also ask to be implemented:
>
> Using 15 evaluation points: toom97, toomA6, toomB5, toomC4, toomD3.
>
> We currently have the complete set of toom function for 7 evaluation
> points and less (toom22, 32, 33, 42, 43, 52, 44, 53, 62) but 43, 52, 62,
> and 53 are not actually used.
>
> On my most-wanted list is 8-point evaluation (at the mpn level).  I
> suspect the evaluation point ±1, ±1/2, ±2, 0, inf would be best, but I
> suppose you understand that better than me.  8-point would give toom54,
> 63, 72, but 72 is probably not too useful.  I am very curious how toom63
> would compare to toom42, given that 6/3 = 4/2.
>
> My ongoing measurements of multiplication primitives (please see the
> pictures at http://gmplib.org/devel/) indicate that toom62 is not
> important.

I've seen the pictures and discussed then with Marco yesterday. It's really 
difficult to detect an a priori way to separate regions for the different 
algorithms.

> Applying toom32 or toom42 partially is as fast or faster in 
> toom62's applicable interval.  If we implement separate evaluation, we
> could handle these very unbalanced operands more efficiently by
> evaluating the shorter operand just once, then reuse it for multiple
> toom32/42 invocations.  This should give a nice speedup.

I think so as well.

> Have people though about unbalanced FFT operands?  FFT works naturally
> for arbitrarily unbalanced operands, but it gets less and less
> efficient; the run time is a function f of the product, where f is
> C*n*log(n)*log(log(n)) and n is the output size.
>
> I suspect (say) toom42 calling fft will be faster for operands whose
> sizes are the the right relation.  Of course, one would want to
> transform the smaller operand just once, like above for partial toom.