Toom-8 testing (mpz level)
Alberto Zanoni
zanoni at volterra.uniroma2.it
Wed Oct 7 14:23:39 CEST 2009
> Dear developers,
> I implemented a mpz version of Toom-8
> algorithm for multiplication. On my architecture I obtained someway
> interesting results. You can find two graphics, for multiplication and
> squaring, at
>
> http://bodrato.it/papers/zanoni.html#CIVV2009b
>
> I think you want to say "Generally speaking, only the degree 1
> (Karatsuba), 2 and 3 ...", if you're speaking of polynomial degree
> (which you seem to be doing since toom8 is said to use degree 7).
Hi,
actually I was wrong with the link, sorry
http://bodrato.it/papers/zanoni.html#SYNASC2009
> Why is the code slower in the 7000-9000 limb interval? I suspect it
> might be due to improper MUL_*_THRESHOLD values for the recursive
> products. Did you "make tune" on your machine and recompile with the
> generated results?
Yes, I did it. It's slower because FFT enters the game at 7168. The graphic
shows that , at least in my case, it is faster than Toom-8 in 7500-9000 limb
interval (for squares, the threshold for FFT is 3840).
> (under "Toom-8 way for long integers multiplication" - thanks Marco !)
>
> As I'm waiting for the publication of the proceedings of the SYNASC 2009
> conference, where the related paper was presented, currently I cannot put
> the paper itself, with all the details, on the web, but actually Marco is
> working quite ahead of it, and will surely produce something better at mpn
> level, with many new ideas, so I think that the paper is actually already
> obsolete.
>
> I suppose that's a common problem. I hope you can put an up-to-date
> version on the web at some point?
I hope so, too, but probably Marco can do it better.
> Just to point out that Toom-Cook-k methods with k as high as 8 (may be
> even more ?) can be effective and worth implementing.
>
> While I very much appreciate your work, I am not sure I am happy about
> its outcome: many toom functions will be needed for optimal performance.
>
> :-)
You're perfectly right: quite a lot of effort still waiting.
> Lots of unbalanced toomMN also ask to be implemented:
>
> Using 15 evaluation points: toom97, toomA6, toomB5, toomC4, toomD3.
>
> We currently have the complete set of toom function for 7 evaluation
> points and less (toom22, 32, 33, 42, 43, 52, 44, 53, 62) but 43, 52, 62,
> and 53 are not actually used.
>
> On my most-wanted list is 8-point evaluation (at the mpn level). I
> suspect the evaluation point ±1, ±1/2, ±2, 0, inf would be best, but I
> suppose you understand that better than me. 8-point would give toom54,
> 63, 72, but 72 is probably not too useful. I am very curious how toom63
> would compare to toom42, given that 6/3 = 4/2.
>
> My ongoing measurements of multiplication primitives (please see the
> pictures at http://gmplib.org/devel/) indicate that toom62 is not
> important.
I've seen the pictures and discussed then with Marco yesterday. It's really
difficult to detect an a priori way to separate regions for the different
algorithms.
> Applying toom32 or toom42 partially is as fast or faster in
> toom62's applicable interval. If we implement separate evaluation, we
> could handle these very unbalanced operands more efficiently by
> evaluating the shorter operand just once, then reuse it for multiple
> toom32/42 invocations. This should give a nice speedup.
I think so as well.
> Have people though about unbalanced FFT operands? FFT works naturally
> for arbitrarily unbalanced operands, but it gets less and less
> efficient; the run time is a function f of the product, where f is
> C*n*log(n)*log(log(n)) and n is the output size.
>
> I suspect (say) toom42 calling fft will be faster for operands whose
> sizes are the the right relation. Of course, one would want to
> transform the smaller operand just once, like above for partial toom.
More information about the gmp-devel
mailing list