I have been constructing division algorithms and want some feedback

William Galway galwaymath at gmail.com
Thu May 14 16:13:00 UTC 2020


Aaron:  It's great to see someone thinking about computation and
mathematics.  I'm not a GMP developer but I feel that your question is
phrased poorly and that you need to do more research on your own before
asking other people to look at it.

Any discrepancies that you see in the timings between GMP and your own code
are quite possibly due to differences in the function-calling interface
between GMP and your own function.  As you suggested, you really must be
sure you're comparing the algorithms themselves (and not the calling
interface), by "... rewriting your functions in C++ and transplanting them
into your local GMP ...".  The C language would also be a good choice, to
get you even closer to the "raw" code.  You should write a single program
to test both functions.  Your program should call each of the functions
about a billion times and you should run the program a few times -- when I
run benchmarks I'm often surprised by the discrepancies between different
runs

After that, if you think you've found something worthwhile, please take
more care when you present your information.  Let us know what your CPU is,
what version of GMP you're using and how it was installed, precisely what
GMP function are you calling, what operating system you're using....
You're not reporting a bug in GMP, but it would be best if you give the
information requested for a bug report.  See
  https://gmplib.org/manual/Reporting-Bugs.html
of course, don't submit that information to GMP-bugs!

When reporting your timing results, please double check your arithmetic.  A
rate of 8.56 million operations per second works out to roughly 117
nanoseconds per call, while "a 20 microsecond delay" works out to 50
THOUSAND operations/second.

-- Regards, Will



On Wed, May 13, 2020 at 3:02 AM Aaron Kutch <aaronkutch at att.net> wrote:

> > I can assure you that a 128/64-bit division on any modern 64-bit
> > processor does not take anywhere near 21 microseconds using GMP.
>
> Sorry, I should have taken a second look at the 21 microsecond value. I
> looked at the generated assembly and it looks like the Rust/C++ interfacing
> is working properly with the GMP divide function being directly called, but
> for some reason all operations have this huge ~20 microsecond delay being
> put on them, even with LTO enabled. I’m not sure where it is coming from,
> maybe it is caused by WSL or some other issue I’m not seeing.
>
> I rebenchmarked using gmpbench-0.2 and found that GMP can perform 128/64
> bit divisions at a rate of 8.56 million operations per second, while my
> Rust benchmarking claims 19 million operations per second. Unfortunately,
> I’m not sure how to properly compare GMP and my method (there are a lot of
> variables that could be making my method seem faster) without rewriting my
> functions in C++ and transplanting them into my local GMP to be
> benchmarked. How does bmpbench work? It appears to be repeatedly calling
> `mpz_tdiv_q(z, x, y)` in a loop without randomizing x and y on each
> iteration (I suppose that randomization isn’t important for very large
> integers). My benchmarking uses an array of 32 random pairs of values to
> divide in a loop, and relies on `cargo bench` to get an average timing.
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
>


More information about the gmp-discuss mailing list