I have been constructing division algorithms and want some feedback

Aaron Kutch aaronkutch at att.net
Wed May 13 02:47:20 UTC 2020


> I can assure you that a 128/64-bit division on any modern 64-bit
> processor does not take anywhere near 21 microseconds using GMP. 

Sorry, I should have taken a second look at the 21 microsecond value. I looked at the generated assembly and it looks like the Rust/C++ interfacing is working properly with the GMP divide function being directly called, but for some reason all operations have this huge ~20 microsecond delay being put on them, even with LTO enabled. I’m not sure where it is coming from, maybe it is caused by WSL or some other issue I’m not seeing.

I rebenchmarked using gmpbench-0.2 and found that GMP can perform 128/64 bit divisions at a rate of 8.56 million operations per second, while my Rust benchmarking claims 19 million operations per second. Unfortunately, I’m not sure how to properly compare GMP and my method (there are a lot of variables that could be making my method seem faster) without rewriting my functions in C++ and transplanting them into my local GMP to be benchmarked. How does bmpbench work? It appears to be repeatedly calling `mpz_tdiv_q(z, x, y)` in a loop without randomizing x and y on each iteration (I suppose that randomization isn’t important for very large integers). My benchmarking uses an array of 32 random pairs of values to divide in a loop, and relies on `cargo bench` to get an average timing.


More information about the gmp-discuss mailing list