Toom-8 testing (mpz level)

Torbjorn Granlund tg at gmplib.org
Thu Oct 8 10:12:08 CEST 2009


nisse at lysator.liu.se (Niels Möller) writes:

  Torbjorn Granlund <tg at gmplib.org> writes:
  
  > Yes, but also notice the little tongue of basecase-blue between toom22
  > and toom32, and the larger tongue between toom32 and toom42.
  
  I wonder how large the performance difference is in those tongues.

I checked one value, 99,36 for i7.  Basecase takes 16036 cycles and
toom42 (2nd best) took 17340 cycles.  Other values in the tongue give
about the same difference, i.e., around 10%.

  If it's just a percent or two, I wouldn't care too much. And if it's
  more, then one might consider a more or less complete table for
  algorithm choice for sizes up to 30 limbs or so

Perhaps tables would be best.

We could also find these tongues using the same type of linear
polynomials that we will probably need to use for choosing the right
toom.

  (BTW, it would be nice if you could add some numbers to the axes of
  the figures. It's hard to see on the logarithmic plots what the
  operand sizes falls within these tongues.)

OK, the program lives in ~tege/GMP/mulmes.  :-)

The X plotting program (mul-paint.c) is interactive, giving me the
coordinates the cursor points to when you right-click.

  There's also a a tongue or two where toom62 partial sticks down into
  the area of small bn, otherwise owned by schoolbook. Most visible on
  the (logarithmic) Core i7 plot. Any explanation for that?
  
I'd call it a "disturbance area".  No, I have not been able to explain
that.  I suspect loop exit branch prediction problems in the basecase
inner loops.  (The branch predictors can learn patterns up to a max
length, say perhaps 16.  A repeated inner loop branch-back that is taken
0-15 times will be perfectly predicted always, even after the last
iteration.  But when the branch-back is taken 16 consecutive times, one
will pay en exit cost of the pipeline depth.)

The Pentium severe noise in the basecase area is caused by a "write
combine buffer" allocation problem (I discussed that at length with
Intel a few years ago).

-- 
Torbjörn


More information about the gmp-devel mailing list