[Request for comments] Potential room for speedup when calculating divmod() of bases with many trailing 0 bits (in binary)

Wed Sep 23 06:13:35 UTC 2020

Hi Marco!

On Wed, Sep 23, 2020 at 12:30 AM Marco Bodrato <bodrato at mail.dm.unipi.it> wrote:
>
> Ciao,
>
> Il 2020-09-22 14:18 Shlomi Fish ha scritto:
> > linking with that gnump made wo-shift faster than with-shift with the
> > original gnump:
>
> You mean that your implementation of the trick was not needed any more,
> right?
>

Yes. gnump with your patch applied running the normal mpz_fdiv_qr code
was faster than the unpatched gnump running my shift-divmod wrapping
code.

> > It is still slower than pypy3 running the python code:
>
> Well, maybe there is too much overhead in the benchmark you prepared. I
> glanced at it and saw a lot of _init/_clear, assert, a new type, a
> scratch_type... just for computing some divisions?

I was trying to implement a reusable API in the .h file. The .c
benchmark is somewhat cluttered due to the USE_SHIFT define.

> Or maybe the numbers are not big enough to let GMP really enter in his
> range of usefulness... (if most of the test involves e.g. 32-bits
> numbers, then GMP may be far slower than some other implementations).
>

I think most of them are above 32-bits (given i do `newval = val*val`
at every stage).

> You should try to write more concise benchmarks (someone also asked for
> more concise e-mails :-)
>

I'll see what I can do.

-- 
Shlomi Fish https://www.shlomifish.org/

Buddha has the Chuck Norris nature.

Please reply to list if it's a mailing list post - http://shlom.in/reply .