middle products
David Harvey
dmharvey at cims.nyu.edu
Wed Jun 17 23:30:11 CEST 2009
On Jun 17, 2009, at 2:41 AM, Niels Möller wrote:
> David Harvey <dmharvey at cims.nyu.edu> writes:
>
>> It computes an approximation to B^(2n) / d, where d is the n-limb
>> input.
>>
>> I profiled it against mpn_tdiv_qr in GMP 4.3.1. I don't know my way
>> around all the new division code in GMP, I expect something in there
>> can beat mpn_tdiv_qr.
>
> Relevant functions in the new division code are
>
> mpn_dc_div_qr: computes both quotient and remainder
>
> mpn_dc_divappr_q: computes approximate quotient
>
> mpn_dc_div_q: computes approximate quotient and somehow corercts it
> if needed
>
> I think at least mpn_dc_div_qr should run at about the same time as
> the old mpn_tdiv_qr, since both use the same Burnikel-Ziegler
> divide-and-conquer algorithm.
Here are the figures for those routines (assuming I am calling them
correctly....)
| inv_bc | inv_mp | div_qr | div_q | divappr_q|
10 | 1.34e+03 | 1.33e+03 | 2.01e+03 | 2.69e+03 | 2.09e+03 |
11 | 1.67e+03 | 1.68e+03 | 2.30e+03 | 2.82e+03 | 2.20e+03 |
12 | 1.72e+03 | 1.73e+03 | 2.32e+03 | 2.78e+03 | 2.38e+03 |
13 | 1.76e+03 | 1.77e+03 | 2.47e+03 | 3.00e+03 | 2.40e+03 |
14 | 1.84e+03 | 1.85e+03 | 2.56e+03 | 3.34e+03 | 2.64e+03 |
16 | 2.18e+03 | 2.16e+03 | 2.82e+03 | 3.54e+03 | 2.74e+03 |
17 | 2.35e+03 | 2.14e+03 | 2.95e+03 | 3.56e+03 | 2.92e+03 |
19 | 2.86e+03 | 2.28e+03 | 3.24e+03 | 3.90e+03 | 3.18e+03 |
21 | 3.03e+03 | 2.54e+03 | 3.56e+03 | 4.09e+03 | 3.49e+03 |
23 | 3.27e+03 | 2.85e+03 | 3.84e+03 | 4.38e+03 | 3.62e+03 |
25 | 3.61e+03 | 3.15e+03 | 4.17e+03 | 4.83e+03 | 3.93e+03 |
28 | 4.50e+03 | 3.73e+03 | 4.75e+03 | 5.46e+03 | 4.48e+03 |
31 | 5.22e+03 | 4.10e+03 | 5.40e+03 | 6.13e+03 | 4.94e+03 |
34 | 6.10e+03 | 4.66e+03 | 6.06e+03 | 6.59e+03 | 5.56e+03 |
37 | 6.42e+03 | 4.89e+03 | 6.55e+03 | 7.20e+03 | 5.80e+03 |
41 | 7.97e+03 | 5.76e+03 | 7.39e+03 | 7.73e+03 | 6.53e+03 |
45 | 9.39e+03 | 6.49e+03 | 8.23e+03 | 8.48e+03 | 7.31e+03 |
50 | 1.06e+04 | 7.72e+03 | 9.78e+03 | 1.01e+04 | 8.57e+03 |
55 | 1.25e+04 | 8.60e+03 | 1.14e+04 | 1.11e+04 | 9.67e+03 |
61 | 1.48e+04 | 9.81e+03 | 1.44e+04 | 1.36e+04 | 1.19e+04 |
67 | 1.75e+04 | 1.16e+04 | 1.69e+04 | 1.62e+04 | 1.40e+04 |
74 | 2.10e+04 | 1.41e+04 | 1.95e+04 | 1.69e+04 | 1.52e+04 |
81 | 2.48e+04 | 1.51e+04 | 2.31e+04 | 2.70e+04 | 1.82e+04 |
89 | 2.80e+04 | 1.71e+04 | 2.62e+04 | 2.28e+04 | 2.05e+04 |
98 | 3.28e+04 | 2.06e+04 | 3.02e+04 | 2.59e+04 | 2.37e+04 |
108 | 4.06e+04 | 2.31e+04 | 3.40e+04 | 3.53e+04 | 2.64e+04 |
119 | 4.43e+04 | 2.68e+04 | 4.24e+04 | 4.04e+04 | 3.15e+04 |
131 | 5.34e+04 | 3.16e+04 | 5.07e+04 | 4.70e+04 | 3.77e+04 |
144 | 5.99e+04 | 3.64e+04 | 5.50e+04 | 5.16e+04 | 4.18e+04 |
158 | 6.93e+04 | 4.28e+04 | 6.67e+04 | 5.93e+04 | 4.94e+04 |
174 | 8.31e+04 | 4.84e+04 | 7.74e+04 | 6.73e+04 | 5.77e+04 |
191 | 9.67e+04 | 5.58e+04 | 9.01e+04 | 7.64e+04 | 6.62e+04 |
211 | 1.10e+05 | 6.52e+04 | 1.07e+05 | 8.94e+04 | 7.83e+04 |
232 | 1.28e+05 | 7.51e+04 | 1.19e+05 | 9.96e+04 | 8.82e+04 |
255 | 1.51e+05 | 8.55e+04 | 1.43e+05 | 1.20e+05 | 1.03e+05 |
281 | 1.74e+05 | 1.01e+05 | 1.76e+05 | 1.44e+05 | 1.26e+05 |
309 | 2.03e+05 | 1.18e+05 | 2.04e+05 | 1.68e+05 | 1.49e+05 |
340 | 2.38e+05 | 1.37e+05 | 2.32e+05 | 1.89e+05 | 1.71e+05 |
374 | 2.74e+05 | 1.56e+05 | 2.70e+05 | 2.17e+05 | 1.98e+05 |
411 | 3.26e+05 | 1.82e+05 | 3.23e+05 | 2.62e+05 | 2.38e+05 |
452 | 3.71e+05 | 2.10e+05 | 3.67e+05 | 2.90e+05 | 2.72e+05 |
497 | 4.34e+05 | 2.43e+05 | 4.42e+05 | 3.54e+05 | 3.27e+05 |
547 | 5.02e+05 | 2.84e+05 | 5.35e+05 | 4.20e+05 | 3.88e+05 |
602 | 5.75e+05 | 3.39e+05 | 5.79e+05 | 4.66e+05 | 4.29e+05 |
662 | 6.71e+05 | 3.83e+05 | 6.81e+05 | 5.42e+05 | 5.05e+05 |
728 | 7.79e+05 | 4.39e+05 | 7.57e+05 | 6.03e+05 | 5.69e+05 |
801 | 9.04e+05 | 5.02e+05 | 9.59e+05 | 7.55e+05 | 7.04e+05 |
881 | 1.03e+06 | 5.85e+05 | 1.08e+06 | 8.65e+05 | 7.96e+05 |
970 | 1.22e+06 | 6.89e+05 | 1.25e+06 | 9.85e+05 | 9.48e+05 |
1067 | 1.41e+06 | 7.94e+05 | 1.52e+06 | 1.21e+06 | 1.13e+06 |
1173 | 1.64e+06 | 9.35e+05 | 1.76e+06 | 1.38e+06 | 1.30e+06 |
1291 | 1.87e+06 | 1.06e+06 | 2.11e+06 | 1.62e+06 | 1.56e+06 |
1420 | 2.16e+06 | 1.23e+06 | 2.18e+06 | 1.74e+06 | 1.65e+06 |
1562 | 2.56e+06 | 1.44e+06 | 2.65e+06 | 2.12e+06 | 2.01e+06 |
1718 | 2.90e+06 | 1.65e+06 | 2.98e+06 | 2.38e+06 | 2.27e+06 |
1890 | 3.38e+06 | 1.93e+06 | 3.53e+06 | 2.80e+06 | 2.67e+06 |
2079 | 3.88e+06 | 2.23e+06 | 4.48e+06 | 3.38e+06 | 3.26e+06 |
2287 | 4.55e+06 | 2.61e+06 | 5.11e+06 | 3.87e+06 | 3.74e+06 |
2516 | 5.17e+06 | 3.03e+06 | 5.26e+06 | 4.23e+06 | 4.06e+06 |
2768 | 5.91e+06 | 3.46e+06 | 5.86e+06 | 4.69e+06 | 4.50e+06 |
3044 | 6.94e+06 | 4.01e+06 | 6.99e+06 | 5.58e+06 | 5.37e+06 |
3349 | 8.07e+06 | 4.57e+06 | 9.55e+06 | 7.35e+06 | 7.15e+06 |
3684 | 9.22e+06 | 5.36e+06 | 9.46e+06 | 7.59e+06 | 7.36e+06 |
4052 | 1.09e+07 | 6.24e+06 | 1.10e+07 | 8.75e+06 | 8.46e+06 |
4457 | 1.25e+07 | 7.24e+06 | 1.46e+07 | 1.12e+07 | 1.10e+07 |
4903 | 1.44e+07 | 8.43e+06 | 1.68e+07 | 1.27e+07 | 1.23e+07 |
5394 | 1.62e+07 | 9.71e+06 | 1.78e+07 | 1.41e+07 | 1.37e+07 |
5933 | 1.87e+07 | 1.11e+07 | 2.27e+07 | 1.76e+07 | 1.72e+07 |
6526 | 2.21e+07 | 1.30e+07 | 2.43e+07 | 1.91e+07 | 1.87e+07 |
7179 | 2.51e+07 | 1.48e+07 | 3.02e+07 | 2.29e+07 | 2.24e+07 |
7897 | 2.90e+07 | 1.76e+07 | 3.51e+07 | 2.72e+07 | 2.64e+07 |
8687 | 3.33e+07 | 2.04e+07 | 3.95e+07 | 2.96e+07 | 2.89e+07 |
9555 | 3.88e+07 | 2.42e+07 | 4.72e+07 | 3.59e+07 | 3.50e+07 |
10511 | 4.37e+07 | 2.72e+07 | 5.31e+07 | 3.98e+07 | 3.88e+07 |
11562 | 5.12e+07 | 3.15e+07 | 5.85e+07 | 4.68e+07 | 4.58e+07 |
12718 | 5.90e+07 | 3.67e+07 | 6.72e+07 | 5.32e+07 | 5.21e+07 |
13990 | 6.79e+07 | 4.18e+07 | 7.78e+07 | 6.16e+07 | 6.05e+07 |
15389 | 7.99e+07 | 4.81e+07 | 9.80e+07 | 7.54e+07 | 7.39e+07 |
16928 | 8.73e+07 | 5.53e+07 | 8.67e+07 | 6.98e+07 | 6.83e+07 |
18621 | 1.04e+08 | 6.55e+07 | 1.13e+08 | 9.21e+07 | 9.03e+07 |
david
More information about the gmp-devel
mailing list