[Gmp-commit] /var/hg/gmp: Micro-optimization and docs for mpn_div_qr_1n_pi1

mercurial at gmplib.org mercurial at gmplib.org
Thu Jun 3 21:51:18 UTC 2021


details:   /var/hg/gmp/rev/a9f0db9f7199
changeset: 18221:a9f0db9f7199
user:      Niels Möller <nisse at lysator.liu.se>
date:      Thu Jun 03 23:50:08 2021 +0200
description:
Micro-optimization and docs for mpn_div_qr_1n_pi1

diffstat:

 ChangeLog                   |   5 +++++
 mpn/generic/div_qr_1n_pi1.c |  27 ++++++++++++++++++++++-----
 2 files changed, 27 insertions(+), 5 deletions(-)

diffs (62 lines):

diff -r 1b45731027e8 -r a9f0db9f7199 ChangeLog
--- a/ChangeLog	Tue May 25 21:11:14 2021 +0200
+++ b/ChangeLog	Thu Jun 03 23:50:08 2021 +0200
@@ -1,3 +1,8 @@
+2021-06-03  Niels Möller  <nisse at lysator.liu.se>
+
+	* mpn/generic/div_qr_1n_pi1.c (mpn_div_qr_1n_pi1): Micro-optimize
+	method 2, and document the main idea of the algorithm.
+
 2021-05-25 Marco Bodrato <bodrato at mail.dm.unipi.it>
 
 	* mpn/generic/sec_powm.c (sec_binvert_limb): New static function.
diff -r 1b45731027e8 -r a9f0db9f7199 mpn/generic/div_qr_1n_pi1.c
--- a/mpn/generic/div_qr_1n_pi1.c	Tue May 25 21:11:14 2021 +0200
+++ b/mpn/generic/div_qr_1n_pi1.c	Thu Jun 03 23:50:08 2021 +0200
@@ -193,6 +193,23 @@
 
 #elif DIV_QR_1N_METHOD == 2
 
+/* The main idea of this algorithm is to write B^2 = d (B + dinv) +
+   B2, where 1 <= B2 < d. Similarly to mpn_mod_1_1p, each iteration
+   can then replace
+
+     u1 B^2 = u1 B2 (mod d)
+
+   which gives a very short critical path for computing the remainder
+   (with some tricks to handle the carry when the next two lower limbs
+   are added in). To also get the quotient, include the corresponding
+   multiple of d in the expression,
+
+     u1 B^2 = u1 B2 + (u1 dinv + u1 B) d
+
+   We get the quotient by accumulating the (u1 dinv + u1 B) terms. The
+   two multiplies, u1 * B2 and u1 * dinv, are independent, and can be
+   executed in parallel.
+ */
 mp_limb_t
 mpn_div_qr_1n_pi1 (mp_ptr qp, mp_srcptr up, mp_size_t n, mp_limb_t u1,
 		   mp_limb_t d, mp_limb_t dinv)
@@ -248,17 +265,17 @@
        *    +---+---+---+
       */
       umul_ppmm (p1, t, u1, dinv);
+      ADDC_LIMB (cy, u0, u0, u2 & B2);
+      u0 -= (-cy) & d;
       add_ssaaaa (q2, q1, -u2, u2 & dinv, CNST_LIMB(0), u1);
-      add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), p1);
       add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), q0);
       q0 = t;
 
+      /* Note that p1 + cy cannot overflow */
+      add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), p1 + cy);
+
       umul_ppmm (p1, p0, u1, B2);
-      ADDC_LIMB (cy, u0, u0, u2 & B2);
-      u0 -= (-cy) & d;
 
-      /* Final q update */
-      add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), cy);
       qp[j+1] = q1;
       MPN_INCR_U (qp+j+2, n-j-2, q2);
 


More information about the gmp-commit mailing list