dc_divappr_q_n
David Harvey
dmharvey at cims.nyu.edu
Thu Jul 2 18:21:13 CEST 2009
Hi,
I tried to write a middle-product-based dc_divappr_q_n, i.e. 2n * n
division, returning approximate n-limb quotient, using a divide-and-
conquer algorithm based on "Algorithm MP-divide" from "The Middle
Product Algorithm I" (Hanrot + Quercia + Zimmermann).
Code is available at
http://cims.nyu.edu/~harvey/mulmid/divappr.c
It seems to work... but it's a bit rough around the edges, and
performance data is mixed.
I compared against the GMP 4.3.1 implementations of:
* mpn_sb_divappr_q (schoolbook n * m division, returning approximate
quotient)
* mpn_dc_divappr_q_n (divide-and-conquer 2n * n division, returning
approximate quotient)
* mpn_sb_div_qr (schoolbook n * m division, returning quotient +
remainder)
* mpn_dc_div_qr_n (divide-and-conquer 2n * n division, returning
quotient + remainder)
* mpn_tdiv_qr (the only (?) public low-level division function in 4.3.1)
Note that mpn_dc_divappr_q_n uses the first four of the above as
subroutines, so I needed to tune the thresholds by hand. I got
DC_DIV_QR_THRESHOLD = 26, and DC_DIVAPPR_Q_THRESHOLD = 12. The latter
seems suspiciously low to me; I'm not sure if my profiling code is
broken or it really should be that low.
This is all on the K8, which is the only chip for which I have a
reasonable mulmid implementation.
Timings are below. Note that all columns include the time to re-copy
the overwritten {np,nn} buffer, except for mpn_tdiv_qr.
The new implementation only starts to beat schoolbook at around n =
35. It does eventually seem to beat the existing mpn_dc_divappr_q_n
at some point, perhaps around n = 250, but the numbers are very weird
and I can't seem to make them settle down. I'm not totally sure I
trust these numbers.
| sb_q | old_dc_q | new_dc_q | sb_qr | dc_qr |
tdiv_qr |
1 | 1.84e+02 | 0.00e+00 | 0.00e+00 | 0.00e+00 | 0.00e+00 | 0.00e
+00 |
2 | 2.64e+02 | 3.79e+02 | 0.00e+00 | 2.24e+02 | 4.54e+02 | 3.78e
+02 |
3 | 3.49e+02 | 4.52e+02 | 0.00e+00 | 2.74e+02 | 4.53e+02 | 5.63e
+02 |
4 | 4.18e+02 | 6.13e+02 | 0.00e+00 | 3.61e+02 | 5.60e+02 | 7.26e
+02 |
5 | 5.47e+02 | 5.86e+02 | 0.00e+00 | 4.22e+02 | 5.99e+02 | 8.62e
+02 |
6 | 6.45e+02 | 7.28e+02 | 1.03e+03 | 5.29e+02 | 7.30e+02 | 8.68e
+02 |
7 | 7.55e+02 | 8.03e+02 | 1.09e+03 | 5.98e+02 | 8.14e+02 | 1.01e
+03 |
8 | 8.52e+02 | 9.12e+02 | 1.22e+03 | 7.24e+02 | 9.40e+02 | 1.14e
+03 |
9 | 1.01e+03 | 1.03e+03 | 1.35e+03 | 8.21e+02 | 1.03e+03 | 1.19e
+03 |
10 | 1.14e+03 | 1.19e+03 | 1.53e+03 | 9.71e+02 | 1.16e+03 | 1.28e
+03 |
11 | 1.27e+03 | 1.27e+03 | 1.59e+03 | 1.05e+03 | 1.28e+03 | 1.40e
+03 |
12 | 1.46e+03 | 1.33e+03 | 1.72e+03 | 1.22e+03 | 1.42e+03 | 1.53e
+03 |
13 | 1.64e+03 | 1.39e+03 | 1.80e+03 | 1.35e+03 | 1.48e+03 | 1.67e
+03 |
14 | 1.77e+03 | 1.61e+03 | 2.04e+03 | 1.59e+03 | 1.75e+03 | 1.87e
+03 |
15 | 1.91e+03 | 1.69e+03 | 2.09e+03 | 1.64e+03 | 1.84e+03 | 1.90e
+03 |
16 | 2.13e+03 | 1.87e+03 | 2.33e+03 | 1.86e+03 | 1.96e+03 | 2.10e
+03 |
17 | 2.31e+03 | 2.06e+03 | 2.49e+03 | 2.05e+03 | 2.44e+03 | 2.35e
+03 |
18 | 2.50e+03 | 2.17e+03 | 2.73e+03 | 2.24e+03 | 2.28e+03 | 2.27e
+03 |
19 | 2.56e+03 | 2.30e+03 | 2.78e+03 | 2.25e+03 | 2.35e+03 | 2.44e
+03 |
20 | 2.87e+03 | 2.49e+03 | 3.03e+03 | 2.72e+03 | 2.76e+03 | 2.82e
+03 |
21 | 3.05e+03 | 2.64e+03 | 3.14e+03 | 2.81e+03 | 2.82e+03 | 2.91e
+03 |
22 | 3.24e+03 | 2.75e+03 | 3.41e+03 | 2.97e+03 | 3.07e+03 | 3.03e
+03 |
23 | 3.44e+03 | 3.02e+03 | 3.59e+03 | 3.23e+03 | 3.21e+03 | 3.20e
+03 |
25 | 3.86e+03 | 3.31e+03 | 3.96e+03 | 3.64e+03 | 3.62e+03 | 3.54e
+03 |
26 | 4.11e+03 | 3.53e+03 | 4.23e+03 | 3.93e+03 | 3.83e+03 | 3.74e
+03 |
27 | 4.32e+03 | 3.65e+03 | 4.45e+03 | 4.13e+03 | 4.07e+03 | 3.92e
+03 |
28 | 4.60e+03 | 3.93e+03 | 4.63e+03 | 4.41e+03 | 4.25e+03 | 4.14e
+03 |
30 | 5.02e+03 | 4.25e+03 | 5.00e+03 | 4.77e+03 | 4.62e+03 | 4.64e
+03 |
31 | 5.31e+03 | 4.52e+03 | 5.30e+03 | 5.22e+03 | 4.99e+03 | 4.84e
+03 |
33 | 5.64e+03 | 4.86e+03 | 5.72e+03 | 5.58e+03 | 5.34e+03 | 5.20e
+03 |
35 | 6.12e+03 | 5.12e+03 | 6.20e+03 | 6.17e+03 | 5.91e+03 | 5.71e
+03 |
36 | 6.63e+03 | 5.73e+03 | 6.55e+03 | 7.31e+03 | 6.29e+03 | 6.59e
+03 |
38 | 6.93e+03 | 5.91e+03 | 6.90e+03 | 7.57e+03 | 6.58e+03 | 6.84e
+03 |
40 | 7.34e+03 | 6.30e+03 | 7.34e+03 | 8.53e+03 | 7.28e+03 | 7.50e
+03 |
42 | 8.03e+03 | 7.11e+03 | 7.98e+03 | 9.41e+03 | 7.85e+03 | 8.00e
+03 |
44 | 8.71e+03 | 7.18e+03 | 8.55e+03 | 1.06e+04 | 8.75e+03 | 8.72e
+03 |
47 | 9.56e+03 | 8.51e+03 | 9.46e+03 | 1.16e+04 | 9.59e+03 | 9.72e
+03 |
49 | 1.01e+04 | 8.67e+03 | 1.00e+04 | 1.21e+04 | 1.06e+04 | 1.03e
+04 |
52 | 1.08e+04 | 9.57e+03 | 1.08e+04 | 1.31e+04 | 1.17e+04 | 1.12e
+04 |
54 | 1.16e+04 | 1.12e+04 | 1.14e+04 | 1.43e+04 | 1.27e+04 | 1.18e
+04 |
57 | 1.28e+04 | 1.15e+04 | 1.24e+04 | 1.63e+04 | 1.49e+04 | 1.31e
+04 |
60 | 1.34e+04 | 1.15e+04 | 1.35e+04 | 1.73e+04 | 1.42e+04 | 1.41e
+04 |
63 | 1.47e+04 | 1.26e+04 | 1.44e+04 | 1.93e+04 | 1.64e+04 | 1.53e
+04 |
66 | 1.63e+04 | 1.38e+04 | 1.57e+04 | 2.13e+04 | 1.70e+04 | 1.63e
+04 |
69 | 1.69e+04 | 1.48e+04 | 1.68e+04 | 2.29e+04 | 1.93e+04 | 1.76e
+04 |
73 | 1.88e+04 | 1.69e+04 | 1.85e+04 | 2.49e+04 | 2.06e+04 | 1.94e
+04 |
76 | 1.93e+04 | 1.69e+04 | 1.96e+04 | 2.52e+04 | 1.98e+04 | 2.04e
+04 |
80 | 2.13e+04 | 1.82e+04 | 2.13e+04 | 2.91e+04 | 2.25e+04 | 2.26e
+04 |
84 | 2.32e+04 | 1.98e+04 | 2.27e+04 | 3.35e+04 | 2.56e+04 | 2.48e
+04 |
89 | 2.50e+04 | 2.19e+04 | 2.45e+04 | 3.55e+04 | 2.83e+04 | 2.70e
+04 |
93 | 2.70e+04 | 2.42e+04 | 2.61e+04 | 3.84e+04 | 3.05e+04 | 2.91e
+04 |
98 | 2.92e+04 | 2.53e+04 | 2.83e+04 | 4.25e+04 | 3.30e+04 | 3.20e
+04 |
103 | 3.29e+04 | 2.81e+04 | 3.02e+04 | 4.88e+04 | 3.65e+04 | 3.53e
+04 |
108 | 3.45e+04 | 3.16e+04 | 3.23e+04 | 5.11e+04 | 3.83e+04 | 3.67e
+04 |
113 | 3.66e+04 | 3.34e+04 | 3.43e+04 | 5.34e+04 | 4.35e+04 | 3.96e
+04 |
119 | 4.01e+04 | 3.88e+04 | 3.69e+04 | 5.95e+04 | 4.56e+04 | 4.37e
+04 |
125 | 4.47e+04 | 4.30e+04 | 4.00e+04 | 6.74e+04 | 5.68e+04 | 4.72e
+04 |
131 | 4.85e+04 | 4.24e+04 | 4.28e+04 | 7.40e+04 | 5.54e+04 | 5.11e
+04 |
138 | 5.20e+04 | 4.46e+04 | 4.74e+04 | 8.10e+04 | 5.84e+04 | 5.56e
+04 |
144 | 5.47e+04 | 4.99e+04 | 5.00e+04 | 8.52e+04 | 5.77e+04 | 5.81e
+04 |
152 | 6.23e+04 | 4.92e+04 | 5.60e+04 | 9.46e+04 | 6.21e+04 | 6.32e
+04 |
159 | 6.65e+04 | 5.47e+04 | 5.89e+04 | 1.07e+05 | 7.27e+04 | 6.84e
+04 |
167 | 7.23e+04 | 5.94e+04 | 6.29e+04 | 1.16e+05 | 8.55e+04 | 7.81e
+04 |
176 | 7.89e+04 | 6.12e+04 | 6.70e+04 | 1.26e+05 | 7.98e+04 | 7.98e
+04 |
185 | 8.66e+04 | 7.38e+04 | 7.31e+04 | 1.37e+05 | 9.36e+04 | 8.72e
+04 |
194 | 9.85e+04 | 7.71e+04 | 7.82e+04 | 1.64e+05 | 1.02e+05 | 9.61e
+04 |
204 | 1.06e+05 | 8.46e+04 | 8.37e+04 | 1.74e+05 | 1.09e+05 | 1.05e
+05 |
214 | 1.18e+05 | 8.98e+04 | 8.97e+04 | 1.93e+05 | 1.19e+05 | 1.13e
+05 |
224 | 1.27e+05 | 9.60e+04 | 9.64e+04 | 2.03e+05 | 1.21e+05 | 1.19e
+05 |
236 | 1.37e+05 | 1.39e+05 | 1.04e+05 | 2.24e+05 | 1.40e+05 | 1.31e
+05 |
247 | 1.45e+05 | 1.26e+05 | 1.10e+05 | 2.32e+05 | 1.53e+05 | 1.42e
+05 |
260 | 1.67e+05 | 1.26e+05 | 1.21e+05 | 2.92e+05 | 1.67e+05 | 1.55e
+05 |
273 | 1.71e+05 | 1.35e+05 | 1.30e+05 | 2.85e+05 | 1.77e+05 | 1.64e
+05 |
287 | 1.87e+05 | 1.39e+05 | 1.40e+05 | 3.18e+05 | 1.82e+05 | 1.81e
+05 |
301 | 2.06e+05 | 1.55e+05 | 1.54e+05 | 3.65e+05 | 2.12e+05 | 1.92e
+05 |
316 | 2.28e+05 | 1.68e+05 | 1.64e+05 | 3.93e+05 | 2.22e+05 | 2.05e
+05 |
332 | 2.50e+05 | 1.90e+05 | 1.77e+05 | 4.34e+05 | 2.34e+05 | 2.30e
+05 |
348 | 2.77e+05 | 2.14e+05 | 1.88e+05 | 4.83e+05 | 2.50e+05 | 2.45e
+05 |
366 | 3.06e+05 | 2.15e+05 | 2.03e+05 | 5.21e+05 | 2.70e+05 | 2.66e
+05 |
384 | 3.22e+05 | 2.12e+05 | 2.17e+05 | 5.85e+05 | 2.82e+05 | 2.79e
+05 |
403 | 3.49e+05 | 2.49e+05 | 2.31e+05 | 6.19e+05 | 3.31e+05 | 3.09e
+05 |
424 | 4.03e+05 | 2.64e+05 | 2.49e+05 | 7.31e+05 | 3.47e+05 | 3.31e
+05 |
445 | 4.22e+05 | 2.97e+05 | 2.69e+05 | 7.25e+05 | 3.77e+05 | 3.55e
+05 |
467 | 4.65e+05 | 3.26e+05 | 2.88e+05 | 8.25e+05 | 4.32e+05 | 3.86e
+05 |
490 | 5.21e+05 | 3.55e+05 | 3.11e+05 | 9.44e+05 | 4.59e+05 | 4.20e
+05 |
515 | 5.85e+05 | 4.10e+05 | 3.38e+05 | 1.03e+06 | 4.95e+05 | 4.46e
+05 |
541 | 6.08e+05 | 3.94e+05 | 3.67e+05 | 1.09e+06 | 5.23e+05 | 4.87e
+05 |
568 | 6.74e+05 | 6.30e+05 | 3.95e+05 | 1.18e+06 | 5.35e+05 | 5.23e
+05 |
596 | 7.35e+05 | 4.46e+05 | 4.35e+05 | 1.34e+06 | 5.88e+05 | 5.65e
+05 |
626 | 8.40e+05 | 4.93e+05 | 4.67e+05 | 1.48e+06 | 6.40e+05 | 6.00e
+05 |
657 | 8.85e+05 | 5.56e+05 | 4.99e+05 | 1.56e+06 | 7.20e+05 | 6.55e
+05 |
690 | 9.78e+05 | 5.66e+05 | 5.34e+05 | 1.78e+06 | 7.36e+05 | 7.09e
+05 |
725 | 1.07e+06 | 8.73e+05 | 5.69e+05 | 1.87e+06 | 8.35e+05 | 7.72e
+05 |
761 | 1.22e+06 | 6.86e+05 | 6.18e+05 | 2.21e+06 | 8.87e+05 | 8.30e
+05 |
799 | 1.29e+06 | 7.19e+05 | 6.68e+05 | 2.42e+06 | 9.69e+05 | 8.99e
+05 |
839 | 1.50e+06 | 8.45e+05 | 6.98e+05 | 2.78e+06 | 1.17e+06 | 9.73e
+05 |
881 | 1.62e+06 | 8.66e+05 | 7.63e+05 | 3.00e+06 | 1.13e+06 | 1.03e
+06 |
925 | 1.77e+06 | 9.50e+05 | 8.30e+05 | 3.21e+06 | 1.26e+06 | 1.12e
+06 |
972 | 1.89e+06 | 9.77e+05 | 8.74e+05 | 3.55e+06 | 1.28e+06 | 1.21e
+06 |
1020 | 2.02e+06 | 1.03e+06 | 9.58e+05 | 3.65e+06 | 1.32e+06 | 1.30e
+06 |
1071 | 2.37e+06 | 1.20e+06 | 1.04e+06 | 4.32e+06 | 1.57e+06 | 1.41e
+06 |
1125 | 2.58e+06 | 1.30e+06 | 1.12e+06 | 4.51e+06 | 1.70e+06 | 1.52e
+06 |
1181 | 2.75e+06 | 1.36e+06 | 1.23e+06 | 5.16e+06 | 1.82e+06 | 1.65e
+06 |
1240 | 3.08e+06 | 1.38e+06 | 1.31e+06 | 5.68e+06 | 1.80e+06 | 1.75e
+06 |
1302 | 3.29e+06 | 1.56e+06 | 1.40e+06 | 6.27e+06 | 2.03e+06 | 1.88e
+06 |
1367 | 3.64e+06 | 1.71e+06 | 1.50e+06 | 7.08e+06 | 2.30e+06 | 2.06e
+06 |
1436 | 4.27e+06 | 1.77e+06 | 1.62e+06 | 8.01e+06 | 2.28e+06 | 2.21e
+06 |
1507 | 4.59e+06 | 2.05e+06 | 1.74e+06 | 8.74e+06 | 2.72e+06 | 2.38e
+06 |
1583 | 4.86e+06 | 2.52e+06 | 1.90e+06 | 9.17e+06 | 2.95e+06 | 2.60e
+06 |
1662 | 5.31e+06 | 2.29e+06 | 2.05e+06 | 9.77e+06 | 2.90e+06 | 2.78e
+06 |
1745 | 5.97e+06 | 2.62e+06 | 2.17e+06 | 1.09e+07 | 3.45e+06 | 2.94e
+06 |
1832 | 6.72e+06 | 2.61e+06 | 2.37e+06 | 1.21e+07 | 3.32e+06 | 3.17e
+06 |
1924 | 7.28e+06 | 2.78e+06 | 2.52e+06 | 1.31e+07 | 3.58e+06 | 3.43e
+06 |
david
More information about the gmp-devel
mailing list