[Request for comments] Potential room for speedup when calculating divmod() of bases with many trailing 0 bits (in binary)

Mon Sep 21 15:30:10 UTC 2020

Ciao,

Il 2020-09-14 18:50 Shlomi Fish ha scritto:
> I was able to improve upon mpz_mod here:

> This is in case the number that one divides by is itself divisible by a
> large power of 2.

There are many special forms for the divisors that can stimulate writing 
special code to speed up the modulus computation. The form k*2^n is one 
of them.

> The benchmarks are:
>    - mpz_mod runs the benchmark in about 160 seconds.

Which benchmark? Does it involves general numbers or this special form 
only?
Which version of GMP are you using?

> Is there interest in incorporating such a change to the core GMP 
> library?

There already are tricks like that, in some internal functions of the 
library.
E.g. since GMP-5.0 (2010) "The function mpz_powm is [...] faster for 
even modulus, since it now partially factors such modulus [...]".
Should that form deserve some special lines of code in mpz_tdiv?

By the way, if you want to play with this, try the below patch for GMP, 
recompile it, and test your "benchmark" again.

----8<------8<----
diff -r b851f9336a86 mpz/tdiv_qr.c

--- a/mpz/tdiv_qr.c     Thu Sep 17 12:00:00 2020 +0200
+++ b/mpz/tdiv_qr.c     Mon Sep 21 17:11:50 2020 +0200
@@ -36,7 +36,7 @@
  void
  mpz_tdiv_qr (mpz_ptr quot, mpz_ptr rem, mpz_srcptr num, mpz_srcptr den)
  {
-  mp_size_t ql;
+  mp_size_t ql, n0;
    mp_size_t ns, ds, nl, dl;
    mp_ptr np, dp, qp, rp;
    TMP_DECL;
@@ -95,7 +95,12 @@
        np = tp;
      }

-  mpn_tdiv_qr (qp, rp, 0L, np, nl, dp, dl);
+  for (n0 = 0; *dp == 0; ++dp)
+    {
+      rp [n0++] = *np++;
+      --nl;
+    }
+  mpn_tdiv_qr (qp, rp + n0, 0L, np, nl, dp, dl - n0);

    ql -=  qp[ql - 1] == 0;
    MPN_NORMALIZE (rp, dl);
diff -r b851f9336a86 mpz/tdiv_r.c
--- a/mpz/tdiv_r.c      Thu Sep 17 12:00:00 2020 +0200
+++ b/mpz/tdiv_r.c      Mon Sep 21 17:11:50 2020 +0200
@@ -35,7 +35,7 @@
  void
  mpz_tdiv_r (mpz_ptr rem, mpz_srcptr num, mpz_srcptr den)
  {
-  mp_size_t ql;
+  mp_size_t ql, n0;
    mp_size_t ns, nl, dl;
    mp_ptr np, dp, qp, rp;
    TMP_DECL;
@@ -88,7 +88,12 @@
        np = tp;
      }

-  mpn_tdiv_qr (qp, rp, 0L, np, nl, dp, dl);
+  for (n0 = 0; *dp == 0; ++dp)
+    {
+      rp [n0++] = *np++;
+      --nl;
+    }
+  mpn_tdiv_qr (qp, rp + n0, 0L, np, nl, dp, dl - n0);

    MPN_NORMALIZE (rp, dl);

----8<------8<----

Maybe it can speed up your case!-)

Ĝis,
m

-- 
http://bodrato.it/papers/