[Request for comments] Potential room for speedup when calculating divmod() of bases with many trailing 0 bits (in binary)
Shlomi Fish
shlomif at gmail.com
Tue Sep 22 11:31:23 UTC 2020
Hi Marco!
Thanks for the reply.
On Mon, Sep 21, 2020 at 6:30 PM Marco Bodrato <bodrato at mail.dm.unipi.it> wrote:
>
> Ciao,
>
> Il 2020-09-14 18:50 Shlomi Fish ha scritto:
> > I was able to improve upon mpz_mod here:
>
> > This is in case the number that one divides by is itself divisible by a
> > large power of 2.
>
> There are many special forms for the divisors that can stimulate writing
> special code to speed up the modulus computation. The form k*2^n is one
> of them.
>
I see - well I'd imagine bits and powers of 2 to be especially quick.
> > The benchmarks are:
> > - mpz_mod runs the benchmark in about 160 seconds.
>
> Which benchmark? Does it involves general numbers or this special form
> only?
This special form only, and see:
```
[shlomif at localhost gmp-shift_divmod]$ git i
⇒ On branch master
⇒ Your branch is up to date with 'origin/master'.
?? divmod-gcc-with-shift.exe
?? divmod-gcc-wo-shift.exe
?? divmod-test.exe
?? ../more-itertools/
?? ../python-shift_divmod/dest/
⇒ Remotes:
origin git at github.com:shlomif/shift_divmod.git (fetch)
origin git at github.com:shlomif/shift_divmod.git (push)
[shlomif at localhost gmp-shift_divmod]$ pwd
/home/shlomif/progs/python/shift_divmod/gmp-shift_divmod
[shlomif at localhost gmp-shift_divmod]$ make benchmark
clang -D USE_SHIFT=1 -o divmod-clang-with-shift.exe -O3 -flto
-fomit-frame-pointer -march=native -Wall -Wextra -Weverything
-Wno-extra-semi-stmt shift_divmod_gmp.c -lgmp -ltcmalloc
clang -o divmod-clang-wo-shift.exe -O3 -flto -fomit-frame-pointer
-march=native -Wall -Wextra -Weverything -Wno-extra-semi-stmt
shift_divmod_gmp.c -lgmp -ltcmalloc
time ./divmod-gcc-with-shift.exe
```
Here:
https://github.com/shlomif/shift_divmod/tree/master/gmp-shift_divmod
> Which version of GMP are you using?
>
gmp-devel-6.1.2-13.fc32.x86_64
> > Is there interest in incorporating such a change to the core GMP
> > library?
>
> There already are tricks like that, in some internal functions of the
> library.
> E.g. since GMP-5.0 (2010) "The function mpz_powm is [...] faster for
> even modulus, since it now partially factors such modulus [...]".
> Should that form deserve some special lines of code in mpz_tdiv?
>
> By the way, if you want to play with this, try the below patch for GMP,
> recompile it, and test your "benchmark" again.
Thanks! I'd like to try that.
>
> ----8<------8<----
> diff -r b851f9336a86 mpz/tdiv_qr.c
> --- a/mpz/tdiv_qr.c Thu Sep 17 12:00:00 2020 +0200
> +++ b/mpz/tdiv_qr.c Mon Sep 21 17:11:50 2020 +0200
> @@ -36,7 +36,7 @@
> void
> mpz_tdiv_qr (mpz_ptr quot, mpz_ptr rem, mpz_srcptr num, mpz_srcptr den)
> {
> - mp_size_t ql;
> + mp_size_t ql, n0;
> mp_size_t ns, ds, nl, dl;
> mp_ptr np, dp, qp, rp;
> TMP_DECL;
> @@ -95,7 +95,12 @@
> np = tp;
> }
>
> - mpn_tdiv_qr (qp, rp, 0L, np, nl, dp, dl);
> + for (n0 = 0; *dp == 0; ++dp)
> + {
> + rp [n0++] = *np++;
> + --nl;
> + }
> + mpn_tdiv_qr (qp, rp + n0, 0L, np, nl, dp, dl - n0);
>
> ql -= qp[ql - 1] == 0;
> MPN_NORMALIZE (rp, dl);
> diff -r b851f9336a86 mpz/tdiv_r.c
> --- a/mpz/tdiv_r.c Thu Sep 17 12:00:00 2020 +0200
> +++ b/mpz/tdiv_r.c Mon Sep 21 17:11:50 2020 +0200
> @@ -35,7 +35,7 @@
> void
> mpz_tdiv_r (mpz_ptr rem, mpz_srcptr num, mpz_srcptr den)
> {
> - mp_size_t ql;
> + mp_size_t ql, n0;
> mp_size_t ns, nl, dl;
> mp_ptr np, dp, qp, rp;
> TMP_DECL;
> @@ -88,7 +88,12 @@
> np = tp;
> }
>
> - mpn_tdiv_qr (qp, rp, 0L, np, nl, dp, dl);
> + for (n0 = 0; *dp == 0; ++dp)
> + {
> + rp [n0++] = *np++;
> + --nl;
> + }
> + mpn_tdiv_qr (qp, rp + n0, 0L, np, nl, dp, dl - n0);
>
> MPN_NORMALIZE (rp, dl);
>
> ----8<------8<----
>
> Maybe it can speed up your case!-)
>
> Ĝis,
> m
>
> --
> http://bodrato.it/papers/
--
Shlomi Fish https://www.shlomifish.org/
Buddha has the Chuck Norris nature.
Please reply to list if it's a mailing list post - http://shlom.in/reply .
More information about the gmp-discuss
mailing list