T3/T3 mul_2 and addmul_2

Torbjorn Granlund tg at gmplib.org
Mon Mar 25 19:45:27 CET 2013

  > If you want to play with this, please start with the checked in code
  > (you'll need to fresh configure.ac to allow the aormul_2 'multifunc'
  > name).  The first thing to try is its speed compared to the code you
  > timed above.
  I'm getting wildly different performance characteristics for the code
  you checked in, for example for mpn_mul_2 the current GMP tree gives:
  davem at patience:~/src/GMP/HG/build-sparc64-ultrasparct4/tune$ ./speed -C -s 32-64 -t 2 mpn_mul_2
  overhead 6.06 cycles, precision 10000 units of 3.51e-10 secs, CPU freq 2847.28 MHz
  32             8.5705
  34             7.6267
  36             8.5262
  38             7.5665
  40             8.5000
  42             7.5605
  44             8.4694
  46             7.5275
  48             8.4615
  50             7.5248
  52             8.4455
  54             7.4956
  56             8.4301
  58             7.5022
  60             8.4167
  62             7.4635
  64             8.4048

Why are you using just even sizes?  Perhaps there is fluctuation for
both variants, but you strike only one?  Odd sizes might trigger the
same behaviour for the checked-in code.

These fluctuations are presumably alignement dependent, where rp-up mod
2^t are what is relevant, and t is soem small constant.

I cannot recall which edits I made between these variants.  If only the
checked-in code has fluctuations, then it should be no problem finding
an edit which avoids them.  If both variants have fluctuations, then it
will be harder, of course.


More information about the gmp-devel mailing list