dc_divappr_q_n

David Harvey dmharvey at cims.nyu.edu
Thu Jul 2 18:21:13 CEST 2009


Hi,

I tried to write a middle-product-based dc_divappr_q_n, i.e. 2n * n  
division, returning approximate n-limb quotient, using a divide-and- 
conquer algorithm based on "Algorithm MP-divide" from "The Middle  
Product Algorithm I" (Hanrot + Quercia + Zimmermann).

Code is available at

http://cims.nyu.edu/~harvey/mulmid/divappr.c

It seems to work... but it's a bit rough around the edges, and  
performance data is mixed.

I compared against the GMP 4.3.1 implementations of:

* mpn_sb_divappr_q (schoolbook n * m division, returning approximate  
quotient)
* mpn_dc_divappr_q_n (divide-and-conquer 2n * n division, returning  
approximate quotient)
* mpn_sb_div_qr (schoolbook n * m division, returning quotient +  
remainder)
* mpn_dc_div_qr_n (divide-and-conquer 2n * n division, returning  
quotient + remainder)
* mpn_tdiv_qr (the only (?) public low-level division function in 4.3.1)

Note that mpn_dc_divappr_q_n uses the first four of the above as  
subroutines, so I needed to tune the thresholds by hand. I got  
DC_DIV_QR_THRESHOLD = 26, and DC_DIVAPPR_Q_THRESHOLD = 12. The latter  
seems suspiciously low to me; I'm not sure if my profiling code is  
broken or it really should be that low.

This is all on the K8, which is the only chip for which I have a  
reasonable mulmid implementation.

Timings are below. Note that all columns include the time to re-copy  
the overwritten {np,nn} buffer, except for mpn_tdiv_qr.

The new implementation only starts to beat schoolbook at around n =  
35. It does eventually seem to beat the existing mpn_dc_divappr_q_n  
at some point, perhaps around n = 250, but the numbers are very weird  
and I can't seem to make them settle down. I'm not totally sure I  
trust these numbers.

       |   sb_q   | old_dc_q | new_dc_q |  sb_qr   |  dc_qr   |  
tdiv_qr  |
     1 | 1.84e+02 | 0.00e+00 | 0.00e+00 | 0.00e+00 | 0.00e+00 | 0.00e 
+00 |
     2 | 2.64e+02 | 3.79e+02 | 0.00e+00 | 2.24e+02 | 4.54e+02 | 3.78e 
+02 |
     3 | 3.49e+02 | 4.52e+02 | 0.00e+00 | 2.74e+02 | 4.53e+02 | 5.63e 
+02 |
     4 | 4.18e+02 | 6.13e+02 | 0.00e+00 | 3.61e+02 | 5.60e+02 | 7.26e 
+02 |
     5 | 5.47e+02 | 5.86e+02 | 0.00e+00 | 4.22e+02 | 5.99e+02 | 8.62e 
+02 |
     6 | 6.45e+02 | 7.28e+02 | 1.03e+03 | 5.29e+02 | 7.30e+02 | 8.68e 
+02 |
     7 | 7.55e+02 | 8.03e+02 | 1.09e+03 | 5.98e+02 | 8.14e+02 | 1.01e 
+03 |
     8 | 8.52e+02 | 9.12e+02 | 1.22e+03 | 7.24e+02 | 9.40e+02 | 1.14e 
+03 |
     9 | 1.01e+03 | 1.03e+03 | 1.35e+03 | 8.21e+02 | 1.03e+03 | 1.19e 
+03 |
    10 | 1.14e+03 | 1.19e+03 | 1.53e+03 | 9.71e+02 | 1.16e+03 | 1.28e 
+03 |
    11 | 1.27e+03 | 1.27e+03 | 1.59e+03 | 1.05e+03 | 1.28e+03 | 1.40e 
+03 |
    12 | 1.46e+03 | 1.33e+03 | 1.72e+03 | 1.22e+03 | 1.42e+03 | 1.53e 
+03 |
    13 | 1.64e+03 | 1.39e+03 | 1.80e+03 | 1.35e+03 | 1.48e+03 | 1.67e 
+03 |
    14 | 1.77e+03 | 1.61e+03 | 2.04e+03 | 1.59e+03 | 1.75e+03 | 1.87e 
+03 |
    15 | 1.91e+03 | 1.69e+03 | 2.09e+03 | 1.64e+03 | 1.84e+03 | 1.90e 
+03 |
    16 | 2.13e+03 | 1.87e+03 | 2.33e+03 | 1.86e+03 | 1.96e+03 | 2.10e 
+03 |
    17 | 2.31e+03 | 2.06e+03 | 2.49e+03 | 2.05e+03 | 2.44e+03 | 2.35e 
+03 |
    18 | 2.50e+03 | 2.17e+03 | 2.73e+03 | 2.24e+03 | 2.28e+03 | 2.27e 
+03 |
    19 | 2.56e+03 | 2.30e+03 | 2.78e+03 | 2.25e+03 | 2.35e+03 | 2.44e 
+03 |
    20 | 2.87e+03 | 2.49e+03 | 3.03e+03 | 2.72e+03 | 2.76e+03 | 2.82e 
+03 |
    21 | 3.05e+03 | 2.64e+03 | 3.14e+03 | 2.81e+03 | 2.82e+03 | 2.91e 
+03 |
    22 | 3.24e+03 | 2.75e+03 | 3.41e+03 | 2.97e+03 | 3.07e+03 | 3.03e 
+03 |
    23 | 3.44e+03 | 3.02e+03 | 3.59e+03 | 3.23e+03 | 3.21e+03 | 3.20e 
+03 |
    25 | 3.86e+03 | 3.31e+03 | 3.96e+03 | 3.64e+03 | 3.62e+03 | 3.54e 
+03 |
    26 | 4.11e+03 | 3.53e+03 | 4.23e+03 | 3.93e+03 | 3.83e+03 | 3.74e 
+03 |
    27 | 4.32e+03 | 3.65e+03 | 4.45e+03 | 4.13e+03 | 4.07e+03 | 3.92e 
+03 |
    28 | 4.60e+03 | 3.93e+03 | 4.63e+03 | 4.41e+03 | 4.25e+03 | 4.14e 
+03 |
    30 | 5.02e+03 | 4.25e+03 | 5.00e+03 | 4.77e+03 | 4.62e+03 | 4.64e 
+03 |
    31 | 5.31e+03 | 4.52e+03 | 5.30e+03 | 5.22e+03 | 4.99e+03 | 4.84e 
+03 |
    33 | 5.64e+03 | 4.86e+03 | 5.72e+03 | 5.58e+03 | 5.34e+03 | 5.20e 
+03 |
    35 | 6.12e+03 | 5.12e+03 | 6.20e+03 | 6.17e+03 | 5.91e+03 | 5.71e 
+03 |
    36 | 6.63e+03 | 5.73e+03 | 6.55e+03 | 7.31e+03 | 6.29e+03 | 6.59e 
+03 |
    38 | 6.93e+03 | 5.91e+03 | 6.90e+03 | 7.57e+03 | 6.58e+03 | 6.84e 
+03 |
    40 | 7.34e+03 | 6.30e+03 | 7.34e+03 | 8.53e+03 | 7.28e+03 | 7.50e 
+03 |
    42 | 8.03e+03 | 7.11e+03 | 7.98e+03 | 9.41e+03 | 7.85e+03 | 8.00e 
+03 |
    44 | 8.71e+03 | 7.18e+03 | 8.55e+03 | 1.06e+04 | 8.75e+03 | 8.72e 
+03 |
    47 | 9.56e+03 | 8.51e+03 | 9.46e+03 | 1.16e+04 | 9.59e+03 | 9.72e 
+03 |
    49 | 1.01e+04 | 8.67e+03 | 1.00e+04 | 1.21e+04 | 1.06e+04 | 1.03e 
+04 |
    52 | 1.08e+04 | 9.57e+03 | 1.08e+04 | 1.31e+04 | 1.17e+04 | 1.12e 
+04 |
    54 | 1.16e+04 | 1.12e+04 | 1.14e+04 | 1.43e+04 | 1.27e+04 | 1.18e 
+04 |
    57 | 1.28e+04 | 1.15e+04 | 1.24e+04 | 1.63e+04 | 1.49e+04 | 1.31e 
+04 |
    60 | 1.34e+04 | 1.15e+04 | 1.35e+04 | 1.73e+04 | 1.42e+04 | 1.41e 
+04 |
    63 | 1.47e+04 | 1.26e+04 | 1.44e+04 | 1.93e+04 | 1.64e+04 | 1.53e 
+04 |
    66 | 1.63e+04 | 1.38e+04 | 1.57e+04 | 2.13e+04 | 1.70e+04 | 1.63e 
+04 |
    69 | 1.69e+04 | 1.48e+04 | 1.68e+04 | 2.29e+04 | 1.93e+04 | 1.76e 
+04 |
    73 | 1.88e+04 | 1.69e+04 | 1.85e+04 | 2.49e+04 | 2.06e+04 | 1.94e 
+04 |
    76 | 1.93e+04 | 1.69e+04 | 1.96e+04 | 2.52e+04 | 1.98e+04 | 2.04e 
+04 |
    80 | 2.13e+04 | 1.82e+04 | 2.13e+04 | 2.91e+04 | 2.25e+04 | 2.26e 
+04 |
    84 | 2.32e+04 | 1.98e+04 | 2.27e+04 | 3.35e+04 | 2.56e+04 | 2.48e 
+04 |
    89 | 2.50e+04 | 2.19e+04 | 2.45e+04 | 3.55e+04 | 2.83e+04 | 2.70e 
+04 |
    93 | 2.70e+04 | 2.42e+04 | 2.61e+04 | 3.84e+04 | 3.05e+04 | 2.91e 
+04 |
    98 | 2.92e+04 | 2.53e+04 | 2.83e+04 | 4.25e+04 | 3.30e+04 | 3.20e 
+04 |
   103 | 3.29e+04 | 2.81e+04 | 3.02e+04 | 4.88e+04 | 3.65e+04 | 3.53e 
+04 |
   108 | 3.45e+04 | 3.16e+04 | 3.23e+04 | 5.11e+04 | 3.83e+04 | 3.67e 
+04 |
   113 | 3.66e+04 | 3.34e+04 | 3.43e+04 | 5.34e+04 | 4.35e+04 | 3.96e 
+04 |
   119 | 4.01e+04 | 3.88e+04 | 3.69e+04 | 5.95e+04 | 4.56e+04 | 4.37e 
+04 |
   125 | 4.47e+04 | 4.30e+04 | 4.00e+04 | 6.74e+04 | 5.68e+04 | 4.72e 
+04 |
   131 | 4.85e+04 | 4.24e+04 | 4.28e+04 | 7.40e+04 | 5.54e+04 | 5.11e 
+04 |
   138 | 5.20e+04 | 4.46e+04 | 4.74e+04 | 8.10e+04 | 5.84e+04 | 5.56e 
+04 |
   144 | 5.47e+04 | 4.99e+04 | 5.00e+04 | 8.52e+04 | 5.77e+04 | 5.81e 
+04 |
   152 | 6.23e+04 | 4.92e+04 | 5.60e+04 | 9.46e+04 | 6.21e+04 | 6.32e 
+04 |
   159 | 6.65e+04 | 5.47e+04 | 5.89e+04 | 1.07e+05 | 7.27e+04 | 6.84e 
+04 |
   167 | 7.23e+04 | 5.94e+04 | 6.29e+04 | 1.16e+05 | 8.55e+04 | 7.81e 
+04 |
   176 | 7.89e+04 | 6.12e+04 | 6.70e+04 | 1.26e+05 | 7.98e+04 | 7.98e 
+04 |
   185 | 8.66e+04 | 7.38e+04 | 7.31e+04 | 1.37e+05 | 9.36e+04 | 8.72e 
+04 |
   194 | 9.85e+04 | 7.71e+04 | 7.82e+04 | 1.64e+05 | 1.02e+05 | 9.61e 
+04 |
   204 | 1.06e+05 | 8.46e+04 | 8.37e+04 | 1.74e+05 | 1.09e+05 | 1.05e 
+05 |
   214 | 1.18e+05 | 8.98e+04 | 8.97e+04 | 1.93e+05 | 1.19e+05 | 1.13e 
+05 |
   224 | 1.27e+05 | 9.60e+04 | 9.64e+04 | 2.03e+05 | 1.21e+05 | 1.19e 
+05 |
   236 | 1.37e+05 | 1.39e+05 | 1.04e+05 | 2.24e+05 | 1.40e+05 | 1.31e 
+05 |
   247 | 1.45e+05 | 1.26e+05 | 1.10e+05 | 2.32e+05 | 1.53e+05 | 1.42e 
+05 |
   260 | 1.67e+05 | 1.26e+05 | 1.21e+05 | 2.92e+05 | 1.67e+05 | 1.55e 
+05 |
   273 | 1.71e+05 | 1.35e+05 | 1.30e+05 | 2.85e+05 | 1.77e+05 | 1.64e 
+05 |
   287 | 1.87e+05 | 1.39e+05 | 1.40e+05 | 3.18e+05 | 1.82e+05 | 1.81e 
+05 |
   301 | 2.06e+05 | 1.55e+05 | 1.54e+05 | 3.65e+05 | 2.12e+05 | 1.92e 
+05 |
   316 | 2.28e+05 | 1.68e+05 | 1.64e+05 | 3.93e+05 | 2.22e+05 | 2.05e 
+05 |
   332 | 2.50e+05 | 1.90e+05 | 1.77e+05 | 4.34e+05 | 2.34e+05 | 2.30e 
+05 |
   348 | 2.77e+05 | 2.14e+05 | 1.88e+05 | 4.83e+05 | 2.50e+05 | 2.45e 
+05 |
   366 | 3.06e+05 | 2.15e+05 | 2.03e+05 | 5.21e+05 | 2.70e+05 | 2.66e 
+05 |
   384 | 3.22e+05 | 2.12e+05 | 2.17e+05 | 5.85e+05 | 2.82e+05 | 2.79e 
+05 |
   403 | 3.49e+05 | 2.49e+05 | 2.31e+05 | 6.19e+05 | 3.31e+05 | 3.09e 
+05 |
   424 | 4.03e+05 | 2.64e+05 | 2.49e+05 | 7.31e+05 | 3.47e+05 | 3.31e 
+05 |
   445 | 4.22e+05 | 2.97e+05 | 2.69e+05 | 7.25e+05 | 3.77e+05 | 3.55e 
+05 |
   467 | 4.65e+05 | 3.26e+05 | 2.88e+05 | 8.25e+05 | 4.32e+05 | 3.86e 
+05 |
   490 | 5.21e+05 | 3.55e+05 | 3.11e+05 | 9.44e+05 | 4.59e+05 | 4.20e 
+05 |
   515 | 5.85e+05 | 4.10e+05 | 3.38e+05 | 1.03e+06 | 4.95e+05 | 4.46e 
+05 |
   541 | 6.08e+05 | 3.94e+05 | 3.67e+05 | 1.09e+06 | 5.23e+05 | 4.87e 
+05 |
   568 | 6.74e+05 | 6.30e+05 | 3.95e+05 | 1.18e+06 | 5.35e+05 | 5.23e 
+05 |
   596 | 7.35e+05 | 4.46e+05 | 4.35e+05 | 1.34e+06 | 5.88e+05 | 5.65e 
+05 |
   626 | 8.40e+05 | 4.93e+05 | 4.67e+05 | 1.48e+06 | 6.40e+05 | 6.00e 
+05 |
   657 | 8.85e+05 | 5.56e+05 | 4.99e+05 | 1.56e+06 | 7.20e+05 | 6.55e 
+05 |
   690 | 9.78e+05 | 5.66e+05 | 5.34e+05 | 1.78e+06 | 7.36e+05 | 7.09e 
+05 |
   725 | 1.07e+06 | 8.73e+05 | 5.69e+05 | 1.87e+06 | 8.35e+05 | 7.72e 
+05 |
   761 | 1.22e+06 | 6.86e+05 | 6.18e+05 | 2.21e+06 | 8.87e+05 | 8.30e 
+05 |
   799 | 1.29e+06 | 7.19e+05 | 6.68e+05 | 2.42e+06 | 9.69e+05 | 8.99e 
+05 |
   839 | 1.50e+06 | 8.45e+05 | 6.98e+05 | 2.78e+06 | 1.17e+06 | 9.73e 
+05 |
   881 | 1.62e+06 | 8.66e+05 | 7.63e+05 | 3.00e+06 | 1.13e+06 | 1.03e 
+06 |
   925 | 1.77e+06 | 9.50e+05 | 8.30e+05 | 3.21e+06 | 1.26e+06 | 1.12e 
+06 |
   972 | 1.89e+06 | 9.77e+05 | 8.74e+05 | 3.55e+06 | 1.28e+06 | 1.21e 
+06 |
  1020 | 2.02e+06 | 1.03e+06 | 9.58e+05 | 3.65e+06 | 1.32e+06 | 1.30e 
+06 |
  1071 | 2.37e+06 | 1.20e+06 | 1.04e+06 | 4.32e+06 | 1.57e+06 | 1.41e 
+06 |
  1125 | 2.58e+06 | 1.30e+06 | 1.12e+06 | 4.51e+06 | 1.70e+06 | 1.52e 
+06 |
  1181 | 2.75e+06 | 1.36e+06 | 1.23e+06 | 5.16e+06 | 1.82e+06 | 1.65e 
+06 |
  1240 | 3.08e+06 | 1.38e+06 | 1.31e+06 | 5.68e+06 | 1.80e+06 | 1.75e 
+06 |
  1302 | 3.29e+06 | 1.56e+06 | 1.40e+06 | 6.27e+06 | 2.03e+06 | 1.88e 
+06 |
  1367 | 3.64e+06 | 1.71e+06 | 1.50e+06 | 7.08e+06 | 2.30e+06 | 2.06e 
+06 |
  1436 | 4.27e+06 | 1.77e+06 | 1.62e+06 | 8.01e+06 | 2.28e+06 | 2.21e 
+06 |
  1507 | 4.59e+06 | 2.05e+06 | 1.74e+06 | 8.74e+06 | 2.72e+06 | 2.38e 
+06 |
  1583 | 4.86e+06 | 2.52e+06 | 1.90e+06 | 9.17e+06 | 2.95e+06 | 2.60e 
+06 |
  1662 | 5.31e+06 | 2.29e+06 | 2.05e+06 | 9.77e+06 | 2.90e+06 | 2.78e 
+06 |
  1745 | 5.97e+06 | 2.62e+06 | 2.17e+06 | 1.09e+07 | 3.45e+06 | 2.94e 
+06 |
  1832 | 6.72e+06 | 2.61e+06 | 2.37e+06 | 1.21e+07 | 3.32e+06 | 3.17e 
+06 |
  1924 | 7.28e+06 | 2.78e+06 | 2.52e+06 | 1.31e+07 | 3.58e+06 | 3.43e 
+06 |

david



More information about the gmp-devel mailing list