GMP 4.3 multiplication performance
Torbjorn Granlund
tg at gmplib.org
Wed Jun 3 00:07:35 CEST 2009
nisse at lysator.liu.se (Niels Möller) writes:
One comment on your interpolation function: You don't need any scratch
space. But then for the operation
/* W2 =(W2 - W4)/3 - W0<<2 */
you use mpn_submul_1 with a constant multiplier of 4, which I imagine
is more costly than a shift. toom_interpolate_7pts has the same
problem, and that code does the opposite, it uses mpn_lshift to a
scratch area followed by mpn_sub_n.
I'm not sure what's fastest, but it's definitely annoying that you
can't shift without using a scratch area.
One should use mpn_sublsh_n. Unfortunately, that function is currently
vapourware, since no such code is finished. :-)
If using submul_1 removes the last scratch space needs, then perhaps use
it unconditionally, instead of lshift. Optionally, use sublsh_n if
HAVE_NATIVE_mpn_sublsh_n and submul_1 otherwise.
Here is the full matrix of planned combination function. The cycle
estimates are for the "1" variants.
COMBINATION FUNCTIONS
cycles current/best (c=1)
shift-by-1 shift-by-c K8 P6-15
addlsh1 (a + 2b) addlsh (a + (2^c)b) 2/1.5
sublsh1 (a - 2b) sublsh (a - (2^c)b) 2.21/1.7
rsblsh1 (2a - b) rsblsh ((2^c)a - b) -/1.5
lsh1add 2(a + b) lshadd (2^c)(a + b) -/1.5
lsh1sub 2(a - b) lshsub (2^c)(a - b) -/1.5
addrsh1 (a + b/2) addrsh (a + b/(2^c)) -/1.5
subrsh1 (a - b/2) subrsh (a - b/(2^c)) -/1.7
rsbrsh1 (a/2 - b) rsbrsh (a/(2^c) - b) -/1.5
rsh1add (a + b)/2 rshadd (a + b)/(2^c) 2.14/1.5
rsh1sub (a - b)/2 rshsub (a - b)/(2^c) 2.14/1.5
addadd (a + b + c) -/2
addsub (a + b - c) -/2
--
Torbjörn
More information about the gmp-devel
mailing list