Toom-8 testing (mpz level)

Wed Oct 7 17:17:01 CEST 2009

nisse at lysator.liu.se (Niels Möller) writes:

  Torbjorn Granlund <tg at gmplib.org> writes:

  > I think the updated graphs might be better.  I've refined the measure-
  > ment code, and updated the underlying GMP.  There are now also diagrams
  > for 3 different machine types, so that one may see patterns better.

  Looks a lot more like sectors now. Some observations:

  Borders are a lot sharper for Core 2 than Athlon.

Yes, considerably so.

I had access to a more modern Athlon64 system for a while, and it had
sharper borders.  But not as sharp as Core i7.

I suspect this is cache-related.  Intel's caches are better organised in
my opinion; they are multi-way where AMD sticks to 2-way L1 caches.  I
think there are more bank conflicts with AMD's caches too.  (A bank
conflict is when two accesses to [very] different addresses cannot
execute in parallel because they hit the same cache bank.)

  In the Athlon figure, the border between toom52 and toom62 and the
  upper border for toom52 partial are sharper than others.

Yes.  There are two kind of blur, I think.

1. Large-scale blur, possibly caused by cache effects or by non-
linearity in sub-products (presumably then point at infinity).  See for
example Athlon's border between toom44 and toom43.

2. Small-scale blur, caused by close to same-speed results.  See for
example Athlon's toom62 vs toom32-partial or both Intel processors'
behaviour of toom53 vs toom42.

  On Core 2 and Pentium 4, the area for toom42 is quite small, almost
  overtaken by toom53. Hmm, you're including toom43 now? Its area is
  smaller than toom44 and toom53.

Note that toom43 is wider for smaller operands.  toom53 is wider for
larger operands.  For all 3 processors.

  The region where FFT wins looks like (an + bn) > THRESHOLD, which
  makes sense (I wouldn't say it's "expected", since this is quite
  complex).

Then I'll say it is expected!  :-)

  And then current FFT-performance is not as smooth as one would wish.

No, but it is smooth for being FFT.

  The region where base case wins looks like bn < THRESHOLD, which is
  nice. I.e., it could use a threshold on the size of the smaller factor
  only.

Yes, but also notice the little tongue of basecase-blue between toom22
and toom32, and the larger tongue between toom32 and toom42.

  Have you tried matching nice rational numbers to the slopes of the
  borders?

Nope.

Another thing to notice is FFT's saw-like border towards smaller an+bn.
It is very visible for Pentium and core i7, for example toom43.  This is
caused by that in the middle of a colour, the infinity operand is
perfectly balanced, and this is of course where a toom algorithm becomes
most efficient.  I.e., the saw pattern is not caused by FFT speed
variations but by Toom speed variations.

-- 
Torbjörn