[Request for comments] Potential room for speedup when calculating divmod() of bases with many trailing 0 bits (in binary)

Tue Sep 22 11:31:23 UTC 2020

Hi Marco!

Thanks for the reply.

On Mon, Sep 21, 2020 at 6:30 PM Marco Bodrato <bodrato at mail.dm.unipi.it> wrote:
>
> Ciao,
>
> Il 2020-09-14 18:50 Shlomi Fish ha scritto:
> > I was able to improve upon mpz_mod here:
>
> > This is in case the number that one divides by is itself divisible by a
> > large power of 2.
>
> There are many special forms for the divisors that can stimulate writing
> special code to speed up the modulus computation. The form k*2^n is one
> of them.
>

I see - well I'd imagine bits and powers of 2 to be especially quick.

> > The benchmarks are:
> >    - mpz_mod runs the benchmark in about 160 seconds.
>
> Which benchmark? Does it involves general numbers or this special form
> only?

This special form only, and see:

```
[shlomif at localhost gmp-shift_divmod]$ git i
⇒ On branch master
⇒ Your branch is up to date with 'origin/master'.
?? divmod-gcc-with-shift.exe
?? divmod-gcc-wo-shift.exe
?? divmod-test.exe
?? ../more-itertools/
?? ../python-shift_divmod/dest/
⇒ Remotes:
origin  git at github.com:shlomif/shift_divmod.git (fetch)
origin  git at github.com:shlomif/shift_divmod.git (push)
[shlomif at localhost gmp-shift_divmod]$ pwd
/home/shlomif/progs/python/shift_divmod/gmp-shift_divmod
[shlomif at localhost gmp-shift_divmod]$ make benchmark
clang -D USE_SHIFT=1 -o divmod-clang-with-shift.exe -O3 -flto
-fomit-frame-pointer -march=native -Wall -Wextra -Weverything
-Wno-extra-semi-stmt shift_divmod_gmp.c -lgmp -ltcmalloc
clang -o divmod-clang-wo-shift.exe -O3 -flto -fomit-frame-pointer
-march=native -Wall -Wextra -Weverything -Wno-extra-semi-stmt
shift_divmod_gmp.c -lgmp -ltcmalloc
time ./divmod-gcc-with-shift.exe
```

Here:

https://github.com/shlomif/shift_divmod/tree/master/gmp-shift_divmod

> Which version of GMP are you using?
>

gmp-devel-6.1.2-13.fc32.x86_64

> > Is there interest in incorporating such a change to the core GMP
> > library?
>
> There already are tricks like that, in some internal functions of the
> library.
> E.g. since GMP-5.0 (2010) "The function mpz_powm is [...] faster for
> even modulus, since it now partially factors such modulus [...]".
> Should that form deserve some special lines of code in mpz_tdiv?
>
> By the way, if you want to play with this, try the below patch for GMP,
> recompile it, and test your "benchmark" again.

Thanks! I'd like to try that.

>
> ----8<------8<----
> diff -r b851f9336a86 mpz/tdiv_qr.c
> --- a/mpz/tdiv_qr.c     Thu Sep 17 12:00:00 2020 +0200
> +++ b/mpz/tdiv_qr.c     Mon Sep 21 17:11:50 2020 +0200
> @@ -36,7 +36,7 @@
>   void
>   mpz_tdiv_qr (mpz_ptr quot, mpz_ptr rem, mpz_srcptr num, mpz_srcptr den)
>   {
> -  mp_size_t ql;
> +  mp_size_t ql, n0;
>     mp_size_t ns, ds, nl, dl;
>     mp_ptr np, dp, qp, rp;
>     TMP_DECL;
> @@ -95,7 +95,12 @@
>         np = tp;
>       }
>
> -  mpn_tdiv_qr (qp, rp, 0L, np, nl, dp, dl);
> +  for (n0 = 0; *dp == 0; ++dp)
> +    {
> +      rp [n0++] = *np++;
> +      --nl;
> +    }
> +  mpn_tdiv_qr (qp, rp + n0, 0L, np, nl, dp, dl - n0);
>
>     ql -=  qp[ql - 1] == 0;
>     MPN_NORMALIZE (rp, dl);
> diff -r b851f9336a86 mpz/tdiv_r.c
> --- a/mpz/tdiv_r.c      Thu Sep 17 12:00:00 2020 +0200
> +++ b/mpz/tdiv_r.c      Mon Sep 21 17:11:50 2020 +0200
> @@ -35,7 +35,7 @@
>   void
>   mpz_tdiv_r (mpz_ptr rem, mpz_srcptr num, mpz_srcptr den)
>   {
> -  mp_size_t ql;
> +  mp_size_t ql, n0;
>     mp_size_t ns, nl, dl;
>     mp_ptr np, dp, qp, rp;
>     TMP_DECL;
> @@ -88,7 +88,12 @@
>         np = tp;
>       }
>
> -  mpn_tdiv_qr (qp, rp, 0L, np, nl, dp, dl);
> +  for (n0 = 0; *dp == 0; ++dp)
> +    {
> +      rp [n0++] = *np++;
> +      --nl;
> +    }
> +  mpn_tdiv_qr (qp, rp + n0, 0L, np, nl, dp, dl - n0);
>
>     MPN_NORMALIZE (rp, dl);
>
> ----8<------8<----
>
> Maybe it can speed up your case!-)
>
> Ĝis,
> m
>
> --
> http://bodrato.it/papers/

-- 
Shlomi Fish https://www.shlomifish.org/

Buddha has the Chuck Norris nature.

Please reply to list if it's a mailing list post - http://shlom.in/reply .