middle products
David Harvey
dmharvey at cims.nyu.edu
Tue Jun 16 19:39:54 CEST 2009
On Jun 9, 2009, at 6:55 PM, Paul Zimmermann wrote:
> as a first application, I suggest trying the middle product for the
> inverse computation. See for example Algorithm 43
> (ApproximateReciprocal)
> from [1], more precisely T = A*X_h at step 5.
>
> Paul Zimmermann
>
> [1] Modern Computer Arithmetic, Richard Brent and Paul Zimmermann,
> version 0.2.1, March 2009, http://www.loria.fr/~zimmerma/mca/
> mca-0.2.1.pdf.
I had a stab at this. I needed to rework the estimates since the
middle product introduces more errors (mainly the missed terms at the
low end).
The code is at:
http://cims.nyu.edu/~harvey/mulmid/invert.c
(still needs a lot of work)
It computes an approximation to B^(2n) / d, where d is the n-limb input.
I profiled it against mpn_tdiv_qr in GMP 4.3.1. I don't know my way
around all the new division code in GMP, I expect something in there
can beat mpn_tdiv_qr. Here are the results, up to 2000 limbs. Columns
are:
* number of limbs
* cycles for invert_basecase, which just uses mpn_tdiv_qr plus a
small amount of data copying
* cycles for invert_mp, which uses Newton iteration + middle product,
recursing down to invert_basecase below n = 15 (which is empirically
the optimal threshold)
* ratio of the last two columns
| inv_bc | inv_mp | ratio |
10 | 1.35e+03 | 1.36e+03 | 1.007 |
11 | 1.70e+03 | 1.70e+03 | 1.005 |
12 | 1.68e+03 | 1.69e+03 | 1.005 |
13 | 1.77e+03 | 1.79e+03 | 1.013 |
14 | 1.91e+03 | 1.92e+03 | 1.005 |
16 | 2.19e+03 | 2.15e+03 | 0.984 |
17 | 2.31e+03 | 2.12e+03 | 0.917 |
19 | 2.87e+03 | 2.31e+03 | 0.804 |
21 | 3.02e+03 | 2.56e+03 | 0.848 |
23 | 3.26e+03 | 2.87e+03 | 0.882 |
25 | 3.60e+03 | 3.14e+03 | 0.873 |
28 | 4.50e+03 | 3.69e+03 | 0.819 |
31 | 5.23e+03 | 4.04e+03 | 0.773 |
34 | 6.11e+03 | 4.63e+03 | 0.758 |
37 | 6.41e+03 | 4.86e+03 | 0.757 |
41 | 7.95e+03 | 5.70e+03 | 0.717 |
45 | 9.39e+03 | 6.43e+03 | 0.684 |
50 | 1.05e+04 | 7.67e+03 | 0.728 |
55 | 1.27e+04 | 8.46e+03 | 0.666 |
61 | 1.50e+04 | 9.69e+03 | 0.648 |
67 | 1.78e+04 | 1.16e+04 | 0.654 |
74 | 2.10e+04 | 1.40e+04 | 0.666 |
81 | 2.52e+04 | 1.49e+04 | 0.593 |
89 | 2.83e+04 | 1.70e+04 | 0.600 |
98 | 3.34e+04 | 2.05e+04 | 0.614 |
108 | 4.11e+04 | 2.30e+04 | 0.559 |
119 | 4.49e+04 | 2.66e+04 | 0.593 |
131 | 5.39e+04 | 3.15e+04 | 0.585 |
144 | 5.90e+04 | 3.61e+04 | 0.612 |
158 | 6.96e+04 | 4.24e+04 | 0.608 |
174 | 8.27e+04 | 4.81e+04 | 0.582 |
191 | 9.72e+04 | 5.56e+04 | 0.572 |
211 | 1.10e+05 | 6.51e+04 | 0.589 |
232 | 1.27e+05 | 7.49e+04 | 0.589 |
255 | 1.52e+05 | 8.56e+04 | 0.564 |
281 | 1.75e+05 | 1.01e+05 | 0.577 |
309 | 2.03e+05 | 1.17e+05 | 0.577 |
340 | 2.39e+05 | 1.37e+05 | 0.571 |
374 | 2.72e+05 | 1.55e+05 | 0.569 |
411 | 3.29e+05 | 1.81e+05 | 0.551 |
452 | 3.72e+05 | 2.08e+05 | 0.560 |
497 | 4.40e+05 | 2.42e+05 | 0.550 |
547 | 5.07e+05 | 2.84e+05 | 0.559 |
602 | 5.81e+05 | 3.38e+05 | 0.582 |
662 | 6.80e+05 | 3.85e+05 | 0.566 |
728 | 7.85e+05 | 4.43e+05 | 0.564 |
801 | 9.11e+05 | 5.05e+05 | 0.555 |
881 | 1.04e+06 | 5.88e+05 | 0.566 |
970 | 1.23e+06 | 6.89e+05 | 0.561 |
1067 | 1.40e+06 | 7.94e+05 | 0.566 |
1173 | 1.65e+06 | 9.35e+05 | 0.566 |
1291 | 1.88e+06 | 1.06e+06 | 0.567 |
1420 | 2.17e+06 | 1.22e+06 | 0.563 |
1562 | 2.57e+06 | 1.44e+06 | 0.561 |
1718 | 2.91e+06 | 1.65e+06 | 0.566 |
1890 | 3.39e+06 | 1.92e+06 | 0.567 |
2079 | 3.90e+06 | 2.22e+06 | 0.569 |
2287 | 4.57e+06 | 2.60e+06 | 0.570 |
2516 | 5.20e+06 | 3.02e+06 | 0.581 |
2768 | 5.95e+06 | 3.44e+06 | 0.579 |
3044 | 6.94e+06 | 4.00e+06 | 0.577 |
3349 | 8.06e+06 | 4.55e+06 | 0.565 |
3684 | 9.20e+06 | 5.35e+06 | 0.582 |
4052 | 1.08e+07 | 6.23e+06 | 0.577 |
4457 | 1.25e+07 | 7.24e+06 | 0.579 |
4903 | 1.46e+07 | 8.38e+06 | 0.573 |
5394 | 1.66e+07 | 9.71e+06 | 0.585 |
5933 | 1.92e+07 | 1.11e+07 | 0.580 |
6526 | 2.28e+07 | 1.30e+07 | 0.573 |
7179 | 2.55e+07 | 1.49e+07 | 0.585 |
7897 | 2.99e+07 | 1.77e+07 | 0.591 |
8687 | 3.39e+07 | 2.03e+07 | 0.599 |
9555 | 3.96e+07 | 2.40e+07 | 0.605 |
10511 | 4.45e+07 | 2.71e+07 | 0.608 |
11562 | 5.22e+07 | 3.15e+07 | 0.604 |
12718 | 5.99e+07 | 3.67e+07 | 0.613 |
13990 | 6.87e+07 | 4.17e+07 | 0.607 |
15389 | 8.11e+07 | 4.86e+07 | 0.599 |
16928 | 8.98e+07 | 5.53e+07 | 0.615 |
18621 | 1.05e+08 | 6.55e+07 | 0.621 |
david
More information about the gmp-devel
mailing list