[Request for comments] Potential room for speedup when calculating divmod() of bases with many trailing 0 bits (in binary)

Shlomi Fish shlomif at gmail.com
Tue Sep 22 12:18:31 UTC 2020


Hi Marco and all!

On Tue, Sep 22, 2020 at 2:31 PM Shlomi Fish <shlomif at gmail.com> wrote:
>
> Hi Marco!
>
> Thanks for the reply.
>
> On Mon, Sep 21, 2020 at 6:30 PM Marco Bodrato <bodrato at mail.dm.unipi.it> wrote:
> >
> > Ciao,
> >
> > Il 2020-09-14 18:50 Shlomi Fish ha scritto:
> > > I was able to improve upon mpz_mod here:
> >
> > > This is in case the number that one divides by is itself divisible by a
> > > large power of 2.
> >
> > There are many special forms for the divisors that can stimulate writing
> > special code to speed up the modulus computation. The form k*2^n is one
> > of them.
> >
>
> I see - well I'd imagine bits and powers of 2 to be especially quick.
>
> > > The benchmarks are:
> > >    - mpz_mod runs the benchmark in about 160 seconds.
> >
> > Which benchmark? Does it involves general numbers or this special form
> > only?
>
> This special form only, and see:
>
> ```
> [shlomif at localhost gmp-shift_divmod]$ git i
> ⇒ On branch master
> ⇒ Your branch is up to date with 'origin/master'.
> ?? divmod-gcc-with-shift.exe
> ?? divmod-gcc-wo-shift.exe
> ?? divmod-test.exe
> ?? ../more-itertools/
> ?? ../python-shift_divmod/dest/
> ⇒ Remotes:
> origin  git at github.com:shlomif/shift_divmod.git (fetch)
> origin  git at github.com:shlomif/shift_divmod.git (push)
> [shlomif at localhost gmp-shift_divmod]$ pwd
> /home/shlomif/progs/python/shift_divmod/gmp-shift_divmod
> [shlomif at localhost gmp-shift_divmod]$ make benchmark
> clang -D USE_SHIFT=1 -o divmod-clang-with-shift.exe -O3 -flto
> -fomit-frame-pointer -march=native -Wall -Wextra -Weverything
> -Wno-extra-semi-stmt shift_divmod_gmp.c -lgmp -ltcmalloc
> clang -o divmod-clang-wo-shift.exe -O3 -flto -fomit-frame-pointer
> -march=native -Wall -Wextra -Weverything -Wno-extra-semi-stmt
> shift_divmod_gmp.c -lgmp -ltcmalloc
> time ./divmod-gcc-with-shift.exe
> ```
>
> Here:
>
> https://github.com/shlomif/shift_divmod/tree/master/gmp-shift_divmod
>
> > Which version of GMP are you using?
> >
>
> gmp-devel-6.1.2-13.fc32.x86_64
>
> > > Is there interest in incorporating such a change to the core GMP
> > > library?
> >
> > There already are tricks like that, in some internal functions of the
> > library.
> > E.g. since GMP-5.0 (2010) "The function mpz_powm is [...] faster for
> > even modulus, since it now partially factors such modulus [...]".
> > Should that form deserve some special lines of code in mpz_tdiv?
> >
> > By the way, if you want to play with this, try the below patch for GMP,
> > recompile it, and test your "benchmark" again.
>
> Thanks! I'd like to try that.
>

I reworked your patch to apply against gmplib hg head (see
https://www.shlomifish.org/Files/files/code/gnump-library-patch-by-Marco-Bodrato--speed-up-mpz_tdiv_qr.patch
- perhaps the rejects were caused by its embedding in the email), and
linking with that gnump made wo-shift faster than with-shift with the
original gnump:

```
[shlomif at localhost gmp-shift_divmod]$ (export
LD_LIBRARY_PATH=/home/shlomif/apps/gmp/lib/ ; time
./divmod-gcc-wo-shift.exe)
result = 5206685
./divmod-gcc-wo-shift.exe  30.21s user 0.02s system 99% cpu 30.310 total

```

It is still slower than pypy3 running the python code:

```
[shlomif at localhost gmp-shift_divmod]$ time pypy3
../python-shift_divmod/code/examples/shift_divmod_example.py
gen_shift_mod
{'val': 5206685, 'time': 25.57560157775879, 'reached': 1000,
'interrupted': False, 'mode': 'gen_shift_mod'}
pypy3 ../python-shift_divmod/code/examples/shift_divmod_example.py
24.23s user 1.07s system 98% cpu 25.634 total
[shlomif at localhost gmp-shift_divmod]$
```

Thanks for your help and insights.

-- 
Shlomi Fish https://www.shlomifish.org/

Buddha has the Chuck Norris nature.

Please reply to list if it's a mailing list post - http://shlom.in/reply .


More information about the gmp-discuss mailing list