T3/T3 mul_2 and addmul_2
Torbjorn Granlund
tg at gmplib.org
Mon Mar 25 19:45:27 CET 2013
> If you want to play with this, please start with the checked in code
> (you'll need to fresh configure.ac to allow the aormul_2 'multifunc'
> name). The first thing to try is its speed compared to the code you
> timed above.
I'm getting wildly different performance characteristics for the code
you checked in, for example for mpn_mul_2 the current GMP tree gives:
davem at patience:~/src/GMP/HG/build-sparc64-ultrasparct4/tune$ ./speed -C -s 32-64 -t 2 mpn_mul_2
overhead 6.06 cycles, precision 10000 units of 3.51e-10 secs, CPU freq 2847.28 MHz
mpn_mul_2
32 8.5705
34 7.6267
36 8.5262
38 7.5665
40 8.5000
42 7.5605
44 8.4694
46 7.5275
48 8.4615
50 7.5248
52 8.4455
54 7.4956
56 8.4301
58 7.5022
60 8.4167
62 7.4635
64 8.4048
Why are you using just even sizes? Perhaps there is fluctuation for
both variants, but you strike only one? Odd sizes might trigger the
same behaviour for the checked-in code.
These fluctuations are presumably alignement dependent, where rp-up mod
2^t are what is relevant, and t is soem small constant.
I cannot recall which edits I made between these variants. If only the
checked-in code has fluctuations, then it should be no problem finding
an edit which avoids them. If both variants have fluctuations, then it
will be harder, of course.
--
Torbjörn
More information about the gmp-devel
mailing list