[Gmp-commit] /var/hg/gmp: Micro-optimization and docs for mpn_div_qr_1n_pi1
mercurial at gmplib.org
mercurial at gmplib.org
Thu Jun 3 21:51:18 UTC 2021
details: /var/hg/gmp/rev/a9f0db9f7199
changeset: 18221:a9f0db9f7199
user: Niels Möller <nisse at lysator.liu.se>
date: Thu Jun 03 23:50:08 2021 +0200
description:
Micro-optimization and docs for mpn_div_qr_1n_pi1
diffstat:
ChangeLog | 5 +++++
mpn/generic/div_qr_1n_pi1.c | 27 ++++++++++++++++++++++-----
2 files changed, 27 insertions(+), 5 deletions(-)
diffs (62 lines):
diff -r 1b45731027e8 -r a9f0db9f7199 ChangeLog
--- a/ChangeLog Tue May 25 21:11:14 2021 +0200
+++ b/ChangeLog Thu Jun 03 23:50:08 2021 +0200
@@ -1,3 +1,8 @@
+2021-06-03 Niels Möller <nisse at lysator.liu.se>
+
+ * mpn/generic/div_qr_1n_pi1.c (mpn_div_qr_1n_pi1): Micro-optimize
+ method 2, and document the main idea of the algorithm.
+
2021-05-25 Marco Bodrato <bodrato at mail.dm.unipi.it>
* mpn/generic/sec_powm.c (sec_binvert_limb): New static function.
diff -r 1b45731027e8 -r a9f0db9f7199 mpn/generic/div_qr_1n_pi1.c
--- a/mpn/generic/div_qr_1n_pi1.c Tue May 25 21:11:14 2021 +0200
+++ b/mpn/generic/div_qr_1n_pi1.c Thu Jun 03 23:50:08 2021 +0200
@@ -193,6 +193,23 @@
#elif DIV_QR_1N_METHOD == 2
+/* The main idea of this algorithm is to write B^2 = d (B + dinv) +
+ B2, where 1 <= B2 < d. Similarly to mpn_mod_1_1p, each iteration
+ can then replace
+
+ u1 B^2 = u1 B2 (mod d)
+
+ which gives a very short critical path for computing the remainder
+ (with some tricks to handle the carry when the next two lower limbs
+ are added in). To also get the quotient, include the corresponding
+ multiple of d in the expression,
+
+ u1 B^2 = u1 B2 + (u1 dinv + u1 B) d
+
+ We get the quotient by accumulating the (u1 dinv + u1 B) terms. The
+ two multiplies, u1 * B2 and u1 * dinv, are independent, and can be
+ executed in parallel.
+ */
mp_limb_t
mpn_div_qr_1n_pi1 (mp_ptr qp, mp_srcptr up, mp_size_t n, mp_limb_t u1,
mp_limb_t d, mp_limb_t dinv)
@@ -248,17 +265,17 @@
* +---+---+---+
*/
umul_ppmm (p1, t, u1, dinv);
+ ADDC_LIMB (cy, u0, u0, u2 & B2);
+ u0 -= (-cy) & d;
add_ssaaaa (q2, q1, -u2, u2 & dinv, CNST_LIMB(0), u1);
- add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), p1);
add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), q0);
q0 = t;
+ /* Note that p1 + cy cannot overflow */
+ add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), p1 + cy);
+
umul_ppmm (p1, p0, u1, B2);
- ADDC_LIMB (cy, u0, u0, u2 & B2);
- u0 -= (-cy) & d;
- /* Final q update */
- add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), cy);
qp[j+1] = q1;
MPN_INCR_U (qp+j+2, n-j-2, q2);
More information about the gmp-commit
mailing list