mpn_invert implementing ApproximateReciprocal.
Torbjorn Granlund
tg at gmplib.org
Fri Dec 11 17:15:34 CET 2009
Very, very nice work, Marco!
I send you a very small table of times (cycles) used to compute the
inverse of a number with "size" limbs.
Tests performed on shell.gmplib (64-bit Core i7, IIRC).
size current Newton Newton+bnm1
50 14616 13152 13172
150 92584 75720 73368
250 209064 169736 157912
350 353652 277704 260520
450 535476 419720 387276
It is worth using Newton's iteration from around 30 limbs, the speed-up
introduced by WRAP_MOD_BNM1 is barely noticeable below 150 limbs and
starts being relevant above 200. Note that within the same environment
MUL_FFT_MODF_THRESHOLD = 400, i.e. the product mod B^n+1 using FFT can not
give any advantage so early.
Nice figures. I actually get
#define INV_NEWTON_THRESHOLD 15
#define INV_MULMOD_BNM1_THRESHOLD 130
on shell.gmplib.org.
On a machine with faster basecase multiply, such as panther.gmplib.org,
I get
#define INV_NEWTON_THRESHOLD 33
#define INV_MULMOD_BNM1_THRESHOLD 174
The column "current" gives the time of the function mpn_invert currently
available in the repository. "Newton" is the time for the code using
Newton's iteration. The last column is the same code with WRAP_MOD_BNM1=1.
The size where the code starts winning is very nice! But I expected a
bigger effect of bnm1. It is possible to make mpn_invert and mpn_binvert
run at about the same speed for large sizes. But this is not currently
the case:
10 0.000000615 #0.000000163
20 0.000001819 #0.000000586
40 0.000005808 #0.000002005
80 0.000018237 #0.000007122
160 0.000056544 #0.000026310
320 0.000168331 #0.000078748
640 0.000492760 #0.000240207
1280 0.001181596 #0.000658926
2560 0.002886252 #0.001624284
5120 0.006674533 #0.003929748
10240 0.015298000 #0.009517735
20480 0.034995000 #0.021580000
40960 0.078814000 #0.051110000
81920 0.171726000 #0.115397000
The file was coded as a replacement for the current mpn/generic/invert.c,
but maybe we prefer to call this function mpn_approxinvert ? Or something
like that?
Perhaps we should have separate "mpn_invertappr" (following the naming
scheme of divappr) and mpn_invert. But I'd suggest we just let this be
mpn_invert for now.
--
Torbjörn
More information about the gmp-devel
mailing list