middle products

David Harvey dmharvey at cims.nyu.edu
Tue Jun 16 19:39:54 CEST 2009


On Jun 9, 2009, at 6:55 PM, Paul Zimmermann wrote:

> as a first application, I suggest trying the middle product for the
> inverse computation. See for example Algorithm 43  
> (ApproximateReciprocal)
> from [1], more precisely T = A*X_h at step 5.
>
> Paul Zimmermann
>
> [1] Modern Computer Arithmetic, Richard Brent and Paul Zimmermann,
> version 0.2.1, March 2009, http://www.loria.fr/~zimmerma/mca/ 
> mca-0.2.1.pdf.

I had a stab at this. I needed to rework the estimates since the  
middle product introduces more errors (mainly the missed terms at the  
low end).

The code is at:

http://cims.nyu.edu/~harvey/mulmid/invert.c

(still needs a lot of work)

It computes an approximation to B^(2n) / d, where d is the n-limb input.

I profiled it against mpn_tdiv_qr in GMP 4.3.1. I don't know my way  
around all the new division code in GMP, I expect something in there  
can beat mpn_tdiv_qr. Here are the results, up to 2000 limbs. Columns  
are:

* number of limbs
* cycles for invert_basecase, which just uses mpn_tdiv_qr plus a  
small amount of data copying
* cycles for invert_mp, which uses Newton iteration + middle product,  
recursing down to invert_basecase below n = 15 (which is empirically  
the optimal threshold)
* ratio of the last two columns

       |  inv_bc  |  inv_mp  |  ratio  |
    10 | 1.35e+03 | 1.36e+03 |  1.007  |
    11 | 1.70e+03 | 1.70e+03 |  1.005  |
    12 | 1.68e+03 | 1.69e+03 |  1.005  |
    13 | 1.77e+03 | 1.79e+03 |  1.013  |
    14 | 1.91e+03 | 1.92e+03 |  1.005  |
    16 | 2.19e+03 | 2.15e+03 |  0.984  |
    17 | 2.31e+03 | 2.12e+03 |  0.917  |
    19 | 2.87e+03 | 2.31e+03 |  0.804  |
    21 | 3.02e+03 | 2.56e+03 |  0.848  |
    23 | 3.26e+03 | 2.87e+03 |  0.882  |
    25 | 3.60e+03 | 3.14e+03 |  0.873  |
    28 | 4.50e+03 | 3.69e+03 |  0.819  |
    31 | 5.23e+03 | 4.04e+03 |  0.773  |
    34 | 6.11e+03 | 4.63e+03 |  0.758  |
    37 | 6.41e+03 | 4.86e+03 |  0.757  |
    41 | 7.95e+03 | 5.70e+03 |  0.717  |
    45 | 9.39e+03 | 6.43e+03 |  0.684  |
    50 | 1.05e+04 | 7.67e+03 |  0.728  |
    55 | 1.27e+04 | 8.46e+03 |  0.666  |
    61 | 1.50e+04 | 9.69e+03 |  0.648  |
    67 | 1.78e+04 | 1.16e+04 |  0.654  |
    74 | 2.10e+04 | 1.40e+04 |  0.666  |
    81 | 2.52e+04 | 1.49e+04 |  0.593  |
    89 | 2.83e+04 | 1.70e+04 |  0.600  |
    98 | 3.34e+04 | 2.05e+04 |  0.614  |
   108 | 4.11e+04 | 2.30e+04 |  0.559  |
   119 | 4.49e+04 | 2.66e+04 |  0.593  |
   131 | 5.39e+04 | 3.15e+04 |  0.585  |
   144 | 5.90e+04 | 3.61e+04 |  0.612  |
   158 | 6.96e+04 | 4.24e+04 |  0.608  |
   174 | 8.27e+04 | 4.81e+04 |  0.582  |
   191 | 9.72e+04 | 5.56e+04 |  0.572  |
   211 | 1.10e+05 | 6.51e+04 |  0.589  |
   232 | 1.27e+05 | 7.49e+04 |  0.589  |
   255 | 1.52e+05 | 8.56e+04 |  0.564  |
   281 | 1.75e+05 | 1.01e+05 |  0.577  |
   309 | 2.03e+05 | 1.17e+05 |  0.577  |
   340 | 2.39e+05 | 1.37e+05 |  0.571  |
   374 | 2.72e+05 | 1.55e+05 |  0.569  |
   411 | 3.29e+05 | 1.81e+05 |  0.551  |
   452 | 3.72e+05 | 2.08e+05 |  0.560  |
   497 | 4.40e+05 | 2.42e+05 |  0.550  |
   547 | 5.07e+05 | 2.84e+05 |  0.559  |
   602 | 5.81e+05 | 3.38e+05 |  0.582  |
   662 | 6.80e+05 | 3.85e+05 |  0.566  |
   728 | 7.85e+05 | 4.43e+05 |  0.564  |
   801 | 9.11e+05 | 5.05e+05 |  0.555  |
   881 | 1.04e+06 | 5.88e+05 |  0.566  |
   970 | 1.23e+06 | 6.89e+05 |  0.561  |
  1067 | 1.40e+06 | 7.94e+05 |  0.566  |
  1173 | 1.65e+06 | 9.35e+05 |  0.566  |
  1291 | 1.88e+06 | 1.06e+06 |  0.567  |
  1420 | 2.17e+06 | 1.22e+06 |  0.563  |
  1562 | 2.57e+06 | 1.44e+06 |  0.561  |
  1718 | 2.91e+06 | 1.65e+06 |  0.566  |
  1890 | 3.39e+06 | 1.92e+06 |  0.567  |
  2079 | 3.90e+06 | 2.22e+06 |  0.569  |
  2287 | 4.57e+06 | 2.60e+06 |  0.570  |
  2516 | 5.20e+06 | 3.02e+06 |  0.581  |
  2768 | 5.95e+06 | 3.44e+06 |  0.579  |
  3044 | 6.94e+06 | 4.00e+06 |  0.577  |
  3349 | 8.06e+06 | 4.55e+06 |  0.565  |
  3684 | 9.20e+06 | 5.35e+06 |  0.582  |
  4052 | 1.08e+07 | 6.23e+06 |  0.577  |
  4457 | 1.25e+07 | 7.24e+06 |  0.579  |
  4903 | 1.46e+07 | 8.38e+06 |  0.573  |
  5394 | 1.66e+07 | 9.71e+06 |  0.585  |
  5933 | 1.92e+07 | 1.11e+07 |  0.580  |
  6526 | 2.28e+07 | 1.30e+07 |  0.573  |
  7179 | 2.55e+07 | 1.49e+07 |  0.585  |
  7897 | 2.99e+07 | 1.77e+07 |  0.591  |
  8687 | 3.39e+07 | 2.03e+07 |  0.599  |
  9555 | 3.96e+07 | 2.40e+07 |  0.605  |
10511 | 4.45e+07 | 2.71e+07 |  0.608  |
11562 | 5.22e+07 | 3.15e+07 |  0.604  |
12718 | 5.99e+07 | 3.67e+07 |  0.613  |
13990 | 6.87e+07 | 4.17e+07 |  0.607  |
15389 | 8.11e+07 | 4.86e+07 |  0.599  |
16928 | 8.98e+07 | 5.53e+07 |  0.615  |
18621 | 1.05e+08 | 6.55e+07 |  0.621  |

david



More information about the gmp-devel mailing list