mpn_invert implementing ApproximateReciprocal.

Fri Dec 11 17:15:34 CET 2009

Very, very nice work, Marco!

  I send you a very small table of times (cycles) used to compute the
  inverse of a number with "size" limbs.
  Tests performed on shell.gmplib (64-bit Core i7, IIRC).

  size       current      Newton    Newton+bnm1
  50           14616       13152        13172
  150          92584       75720        73368
  250         209064      169736       157912
  350         353652      277704       260520
  450         535476      419720       387276

  It is worth using Newton's iteration from around 30 limbs, the speed-up
  introduced by WRAP_MOD_BNM1 is barely noticeable below 150 limbs and
  starts being relevant above 200. Note that within the same environment
  MUL_FFT_MODF_THRESHOLD = 400, i.e. the product mod B^n+1 using FFT can not
  give any advantage so early.

Nice figures. I actually get

#define INV_NEWTON_THRESHOLD             15
#define INV_MULMOD_BNM1_THRESHOLD       130

on shell.gmplib.org.

On a machine with faster basecase multiply, such as panther.gmplib.org,
I get

#define INV_NEWTON_THRESHOLD             33
#define INV_MULMOD_BNM1_THRESHOLD       174

  The column "current" gives the time of the function mpn_invert currently
  available in the repository. "Newton" is the time for the code using
  Newton's iteration. The last column is the same code with WRAP_MOD_BNM1=1.

The size where the code starts winning is very nice!  But I expected a
bigger effect of bnm1.  It is possible to make mpn_invert and mpn_binvert
run at about the same speed for large sizes.  But this is not currently
the case:

10        0.000000615  #0.000000163
20        0.000001819  #0.000000586
40        0.000005808  #0.000002005
80        0.000018237  #0.000007122
160       0.000056544  #0.000026310
320       0.000168331  #0.000078748
640       0.000492760  #0.000240207
1280      0.001181596  #0.000658926
2560      0.002886252  #0.001624284
5120      0.006674533  #0.003929748
10240     0.015298000  #0.009517735
20480     0.034995000  #0.021580000
40960     0.078814000  #0.051110000
81920     0.171726000  #0.115397000

  The file was coded as a replacement for the current mpn/generic/invert.c,
  but maybe we prefer to call this function mpn_approxinvert ? Or something
  like that?

Perhaps we should have separate "mpn_invertappr" (following the naming
scheme of divappr) and mpn_invert.  But I'd suggest we just let this be
mpn_invert for now.

-- 
Torbjörn