middle products

David Harvey dmharvey at cims.nyu.edu
Wed Jun 17 23:30:11 CEST 2009


On Jun 17, 2009, at 2:41 AM, Niels Möller wrote:

> David Harvey <dmharvey at cims.nyu.edu> writes:
>
>> It computes an approximation to B^(2n) / d, where d is the n-limb  
>> input.
>>
>> I profiled it against mpn_tdiv_qr in GMP 4.3.1. I don't know my way
>> around all the new division code in GMP, I expect something in there
>> can beat mpn_tdiv_qr.
>
> Relevant functions in the new division code are
>
>   mpn_dc_div_qr: computes both quotient and remainder
>
>   mpn_dc_divappr_q: computes approximate quotient
>
>   mpn_dc_div_q: computes approximate quotient and somehow corercts it
>                if needed
>
> I think at least mpn_dc_div_qr should run at about the same time as
> the old mpn_tdiv_qr, since both use the same Burnikel-Ziegler
> divide-and-conquer algorithm.

Here are the figures for those routines (assuming I am calling them  
correctly....)

       |  inv_bc  |  inv_mp  |  div_qr  |   div_q  | divappr_q|
    10 | 1.34e+03 | 1.33e+03 | 2.01e+03 | 2.69e+03 | 2.09e+03 |
    11 | 1.67e+03 | 1.68e+03 | 2.30e+03 | 2.82e+03 | 2.20e+03 |
    12 | 1.72e+03 | 1.73e+03 | 2.32e+03 | 2.78e+03 | 2.38e+03 |
    13 | 1.76e+03 | 1.77e+03 | 2.47e+03 | 3.00e+03 | 2.40e+03 |
    14 | 1.84e+03 | 1.85e+03 | 2.56e+03 | 3.34e+03 | 2.64e+03 |
    16 | 2.18e+03 | 2.16e+03 | 2.82e+03 | 3.54e+03 | 2.74e+03 |
    17 | 2.35e+03 | 2.14e+03 | 2.95e+03 | 3.56e+03 | 2.92e+03 |
    19 | 2.86e+03 | 2.28e+03 | 3.24e+03 | 3.90e+03 | 3.18e+03 |
    21 | 3.03e+03 | 2.54e+03 | 3.56e+03 | 4.09e+03 | 3.49e+03 |
    23 | 3.27e+03 | 2.85e+03 | 3.84e+03 | 4.38e+03 | 3.62e+03 |
    25 | 3.61e+03 | 3.15e+03 | 4.17e+03 | 4.83e+03 | 3.93e+03 |
    28 | 4.50e+03 | 3.73e+03 | 4.75e+03 | 5.46e+03 | 4.48e+03 |
    31 | 5.22e+03 | 4.10e+03 | 5.40e+03 | 6.13e+03 | 4.94e+03 |
    34 | 6.10e+03 | 4.66e+03 | 6.06e+03 | 6.59e+03 | 5.56e+03 |
    37 | 6.42e+03 | 4.89e+03 | 6.55e+03 | 7.20e+03 | 5.80e+03 |
    41 | 7.97e+03 | 5.76e+03 | 7.39e+03 | 7.73e+03 | 6.53e+03 |
    45 | 9.39e+03 | 6.49e+03 | 8.23e+03 | 8.48e+03 | 7.31e+03 |
    50 | 1.06e+04 | 7.72e+03 | 9.78e+03 | 1.01e+04 | 8.57e+03 |
    55 | 1.25e+04 | 8.60e+03 | 1.14e+04 | 1.11e+04 | 9.67e+03 |
    61 | 1.48e+04 | 9.81e+03 | 1.44e+04 | 1.36e+04 | 1.19e+04 |
    67 | 1.75e+04 | 1.16e+04 | 1.69e+04 | 1.62e+04 | 1.40e+04 |
    74 | 2.10e+04 | 1.41e+04 | 1.95e+04 | 1.69e+04 | 1.52e+04 |
    81 | 2.48e+04 | 1.51e+04 | 2.31e+04 | 2.70e+04 | 1.82e+04 |
    89 | 2.80e+04 | 1.71e+04 | 2.62e+04 | 2.28e+04 | 2.05e+04 |
    98 | 3.28e+04 | 2.06e+04 | 3.02e+04 | 2.59e+04 | 2.37e+04 |
   108 | 4.06e+04 | 2.31e+04 | 3.40e+04 | 3.53e+04 | 2.64e+04 |
   119 | 4.43e+04 | 2.68e+04 | 4.24e+04 | 4.04e+04 | 3.15e+04 |
   131 | 5.34e+04 | 3.16e+04 | 5.07e+04 | 4.70e+04 | 3.77e+04 |
   144 | 5.99e+04 | 3.64e+04 | 5.50e+04 | 5.16e+04 | 4.18e+04 |
   158 | 6.93e+04 | 4.28e+04 | 6.67e+04 | 5.93e+04 | 4.94e+04 |
   174 | 8.31e+04 | 4.84e+04 | 7.74e+04 | 6.73e+04 | 5.77e+04 |
   191 | 9.67e+04 | 5.58e+04 | 9.01e+04 | 7.64e+04 | 6.62e+04 |
   211 | 1.10e+05 | 6.52e+04 | 1.07e+05 | 8.94e+04 | 7.83e+04 |
   232 | 1.28e+05 | 7.51e+04 | 1.19e+05 | 9.96e+04 | 8.82e+04 |
   255 | 1.51e+05 | 8.55e+04 | 1.43e+05 | 1.20e+05 | 1.03e+05 |
   281 | 1.74e+05 | 1.01e+05 | 1.76e+05 | 1.44e+05 | 1.26e+05 |
   309 | 2.03e+05 | 1.18e+05 | 2.04e+05 | 1.68e+05 | 1.49e+05 |
   340 | 2.38e+05 | 1.37e+05 | 2.32e+05 | 1.89e+05 | 1.71e+05 |
   374 | 2.74e+05 | 1.56e+05 | 2.70e+05 | 2.17e+05 | 1.98e+05 |
   411 | 3.26e+05 | 1.82e+05 | 3.23e+05 | 2.62e+05 | 2.38e+05 |
   452 | 3.71e+05 | 2.10e+05 | 3.67e+05 | 2.90e+05 | 2.72e+05 |
   497 | 4.34e+05 | 2.43e+05 | 4.42e+05 | 3.54e+05 | 3.27e+05 |
   547 | 5.02e+05 | 2.84e+05 | 5.35e+05 | 4.20e+05 | 3.88e+05 |
   602 | 5.75e+05 | 3.39e+05 | 5.79e+05 | 4.66e+05 | 4.29e+05 |
   662 | 6.71e+05 | 3.83e+05 | 6.81e+05 | 5.42e+05 | 5.05e+05 |
   728 | 7.79e+05 | 4.39e+05 | 7.57e+05 | 6.03e+05 | 5.69e+05 |
   801 | 9.04e+05 | 5.02e+05 | 9.59e+05 | 7.55e+05 | 7.04e+05 |
   881 | 1.03e+06 | 5.85e+05 | 1.08e+06 | 8.65e+05 | 7.96e+05 |
   970 | 1.22e+06 | 6.89e+05 | 1.25e+06 | 9.85e+05 | 9.48e+05 |
  1067 | 1.41e+06 | 7.94e+05 | 1.52e+06 | 1.21e+06 | 1.13e+06 |
  1173 | 1.64e+06 | 9.35e+05 | 1.76e+06 | 1.38e+06 | 1.30e+06 |
  1291 | 1.87e+06 | 1.06e+06 | 2.11e+06 | 1.62e+06 | 1.56e+06 |
  1420 | 2.16e+06 | 1.23e+06 | 2.18e+06 | 1.74e+06 | 1.65e+06 |
  1562 | 2.56e+06 | 1.44e+06 | 2.65e+06 | 2.12e+06 | 2.01e+06 |
  1718 | 2.90e+06 | 1.65e+06 | 2.98e+06 | 2.38e+06 | 2.27e+06 |
  1890 | 3.38e+06 | 1.93e+06 | 3.53e+06 | 2.80e+06 | 2.67e+06 |
  2079 | 3.88e+06 | 2.23e+06 | 4.48e+06 | 3.38e+06 | 3.26e+06 |
  2287 | 4.55e+06 | 2.61e+06 | 5.11e+06 | 3.87e+06 | 3.74e+06 |
  2516 | 5.17e+06 | 3.03e+06 | 5.26e+06 | 4.23e+06 | 4.06e+06 |
  2768 | 5.91e+06 | 3.46e+06 | 5.86e+06 | 4.69e+06 | 4.50e+06 |
  3044 | 6.94e+06 | 4.01e+06 | 6.99e+06 | 5.58e+06 | 5.37e+06 |
  3349 | 8.07e+06 | 4.57e+06 | 9.55e+06 | 7.35e+06 | 7.15e+06 |
  3684 | 9.22e+06 | 5.36e+06 | 9.46e+06 | 7.59e+06 | 7.36e+06 |
  4052 | 1.09e+07 | 6.24e+06 | 1.10e+07 | 8.75e+06 | 8.46e+06 |
  4457 | 1.25e+07 | 7.24e+06 | 1.46e+07 | 1.12e+07 | 1.10e+07 |
  4903 | 1.44e+07 | 8.43e+06 | 1.68e+07 | 1.27e+07 | 1.23e+07 |
  5394 | 1.62e+07 | 9.71e+06 | 1.78e+07 | 1.41e+07 | 1.37e+07 |
  5933 | 1.87e+07 | 1.11e+07 | 2.27e+07 | 1.76e+07 | 1.72e+07 |
  6526 | 2.21e+07 | 1.30e+07 | 2.43e+07 | 1.91e+07 | 1.87e+07 |
  7179 | 2.51e+07 | 1.48e+07 | 3.02e+07 | 2.29e+07 | 2.24e+07 |
  7897 | 2.90e+07 | 1.76e+07 | 3.51e+07 | 2.72e+07 | 2.64e+07 |
  8687 | 3.33e+07 | 2.04e+07 | 3.95e+07 | 2.96e+07 | 2.89e+07 |
  9555 | 3.88e+07 | 2.42e+07 | 4.72e+07 | 3.59e+07 | 3.50e+07 |
10511 | 4.37e+07 | 2.72e+07 | 5.31e+07 | 3.98e+07 | 3.88e+07 |
11562 | 5.12e+07 | 3.15e+07 | 5.85e+07 | 4.68e+07 | 4.58e+07 |
12718 | 5.90e+07 | 3.67e+07 | 6.72e+07 | 5.32e+07 | 5.21e+07 |
13990 | 6.79e+07 | 4.18e+07 | 7.78e+07 | 6.16e+07 | 6.05e+07 |
15389 | 7.99e+07 | 4.81e+07 | 9.80e+07 | 7.54e+07 | 7.39e+07 |
16928 | 8.73e+07 | 5.53e+07 | 8.67e+07 | 6.98e+07 | 6.83e+07 |
18621 | 1.04e+08 | 6.55e+07 | 1.13e+08 | 9.21e+07 | 9.03e+07 |

david



More information about the gmp-devel mailing list