T3/T3 mul_2 and addmul_2
tg at gmplib.org
Mon Mar 25 19:45:27 CET 2013
> If you want to play with this, please start with the checked in code
> (you'll need to fresh configure.ac to allow the aormul_2 'multifunc'
> name). The first thing to try is its speed compared to the code you
> timed above.
I'm getting wildly different performance characteristics for the code
you checked in, for example for mpn_mul_2 the current GMP tree gives:
davem at patience:~/src/GMP/HG/build-sparc64-ultrasparct4/tune$ ./speed -C -s 32-64 -t 2 mpn_mul_2
overhead 6.06 cycles, precision 10000 units of 3.51e-10 secs, CPU freq 2847.28 MHz
Why are you using just even sizes? Perhaps there is fluctuation for
both variants, but you strike only one? Odd sizes might trigger the
same behaviour for the checked-in code.
These fluctuations are presumably alignement dependent, where rp-up mod
2^t are what is relevant, and t is soem small constant.
I cannot recall which edits I made between these variants. If only the
checked-in code has fluctuations, then it should be no problem finding
an edit which avoids them. If both variants have fluctuations, then it
will be harder, of course.
More information about the gmp-devel