# middle products

David Harvey dmharvey at cims.nyu.edu
Tue Jun 16 19:39:54 CEST 2009

```On Jun 9, 2009, at 6:55 PM, Paul Zimmermann wrote:

> as a first application, I suggest trying the middle product for the
> inverse computation. See for example Algorithm 43
> (ApproximateReciprocal)
> from [1], more precisely T = A*X_h at step 5.
>
> Paul Zimmermann
>
> [1] Modern Computer Arithmetic, Richard Brent and Paul Zimmermann,
> version 0.2.1, March 2009, http://www.loria.fr/~zimmerma/mca/
> mca-0.2.1.pdf.

I had a stab at this. I needed to rework the estimates since the
middle product introduces more errors (mainly the missed terms at the
low end).

The code is at:

http://cims.nyu.edu/~harvey/mulmid/invert.c

(still needs a lot of work)

It computes an approximation to B^(2n) / d, where d is the n-limb input.

I profiled it against mpn_tdiv_qr in GMP 4.3.1. I don't know my way
around all the new division code in GMP, I expect something in there
can beat mpn_tdiv_qr. Here are the results, up to 2000 limbs. Columns
are:

* number of limbs
* cycles for invert_basecase, which just uses mpn_tdiv_qr plus a
small amount of data copying
* cycles for invert_mp, which uses Newton iteration + middle product,
recursing down to invert_basecase below n = 15 (which is empirically
the optimal threshold)
* ratio of the last two columns

|  inv_bc  |  inv_mp  |  ratio  |
10 | 1.35e+03 | 1.36e+03 |  1.007  |
11 | 1.70e+03 | 1.70e+03 |  1.005  |
12 | 1.68e+03 | 1.69e+03 |  1.005  |
13 | 1.77e+03 | 1.79e+03 |  1.013  |
14 | 1.91e+03 | 1.92e+03 |  1.005  |
16 | 2.19e+03 | 2.15e+03 |  0.984  |
17 | 2.31e+03 | 2.12e+03 |  0.917  |
19 | 2.87e+03 | 2.31e+03 |  0.804  |
21 | 3.02e+03 | 2.56e+03 |  0.848  |
23 | 3.26e+03 | 2.87e+03 |  0.882  |
25 | 3.60e+03 | 3.14e+03 |  0.873  |
28 | 4.50e+03 | 3.69e+03 |  0.819  |
31 | 5.23e+03 | 4.04e+03 |  0.773  |
34 | 6.11e+03 | 4.63e+03 |  0.758  |
37 | 6.41e+03 | 4.86e+03 |  0.757  |
41 | 7.95e+03 | 5.70e+03 |  0.717  |
45 | 9.39e+03 | 6.43e+03 |  0.684  |
50 | 1.05e+04 | 7.67e+03 |  0.728  |
55 | 1.27e+04 | 8.46e+03 |  0.666  |
61 | 1.50e+04 | 9.69e+03 |  0.648  |
67 | 1.78e+04 | 1.16e+04 |  0.654  |
74 | 2.10e+04 | 1.40e+04 |  0.666  |
81 | 2.52e+04 | 1.49e+04 |  0.593  |
89 | 2.83e+04 | 1.70e+04 |  0.600  |
98 | 3.34e+04 | 2.05e+04 |  0.614  |
108 | 4.11e+04 | 2.30e+04 |  0.559  |
119 | 4.49e+04 | 2.66e+04 |  0.593  |
131 | 5.39e+04 | 3.15e+04 |  0.585  |
144 | 5.90e+04 | 3.61e+04 |  0.612  |
158 | 6.96e+04 | 4.24e+04 |  0.608  |
174 | 8.27e+04 | 4.81e+04 |  0.582  |
191 | 9.72e+04 | 5.56e+04 |  0.572  |
211 | 1.10e+05 | 6.51e+04 |  0.589  |
232 | 1.27e+05 | 7.49e+04 |  0.589  |
255 | 1.52e+05 | 8.56e+04 |  0.564  |
281 | 1.75e+05 | 1.01e+05 |  0.577  |
309 | 2.03e+05 | 1.17e+05 |  0.577  |
340 | 2.39e+05 | 1.37e+05 |  0.571  |
374 | 2.72e+05 | 1.55e+05 |  0.569  |
411 | 3.29e+05 | 1.81e+05 |  0.551  |
452 | 3.72e+05 | 2.08e+05 |  0.560  |
497 | 4.40e+05 | 2.42e+05 |  0.550  |
547 | 5.07e+05 | 2.84e+05 |  0.559  |
602 | 5.81e+05 | 3.38e+05 |  0.582  |
662 | 6.80e+05 | 3.85e+05 |  0.566  |
728 | 7.85e+05 | 4.43e+05 |  0.564  |
801 | 9.11e+05 | 5.05e+05 |  0.555  |
881 | 1.04e+06 | 5.88e+05 |  0.566  |
970 | 1.23e+06 | 6.89e+05 |  0.561  |
1067 | 1.40e+06 | 7.94e+05 |  0.566  |
1173 | 1.65e+06 | 9.35e+05 |  0.566  |
1291 | 1.88e+06 | 1.06e+06 |  0.567  |
1420 | 2.17e+06 | 1.22e+06 |  0.563  |
1562 | 2.57e+06 | 1.44e+06 |  0.561  |
1718 | 2.91e+06 | 1.65e+06 |  0.566  |
1890 | 3.39e+06 | 1.92e+06 |  0.567  |
2079 | 3.90e+06 | 2.22e+06 |  0.569  |
2287 | 4.57e+06 | 2.60e+06 |  0.570  |
2516 | 5.20e+06 | 3.02e+06 |  0.581  |
2768 | 5.95e+06 | 3.44e+06 |  0.579  |
3044 | 6.94e+06 | 4.00e+06 |  0.577  |
3349 | 8.06e+06 | 4.55e+06 |  0.565  |
3684 | 9.20e+06 | 5.35e+06 |  0.582  |
4052 | 1.08e+07 | 6.23e+06 |  0.577  |
4457 | 1.25e+07 | 7.24e+06 |  0.579  |
4903 | 1.46e+07 | 8.38e+06 |  0.573  |
5394 | 1.66e+07 | 9.71e+06 |  0.585  |
5933 | 1.92e+07 | 1.11e+07 |  0.580  |
6526 | 2.28e+07 | 1.30e+07 |  0.573  |
7179 | 2.55e+07 | 1.49e+07 |  0.585  |
7897 | 2.99e+07 | 1.77e+07 |  0.591  |
8687 | 3.39e+07 | 2.03e+07 |  0.599  |
9555 | 3.96e+07 | 2.40e+07 |  0.605  |
10511 | 4.45e+07 | 2.71e+07 |  0.608  |
11562 | 5.22e+07 | 3.15e+07 |  0.604  |
12718 | 5.99e+07 | 3.67e+07 |  0.613  |
13990 | 6.87e+07 | 4.17e+07 |  0.607  |
15389 | 8.11e+07 | 4.86e+07 |  0.599  |
16928 | 8.98e+07 | 5.53e+07 |  0.615  |
18621 | 1.05e+08 | 6.55e+07 |  0.621  |

david

```