I have been constructing division algorithms and want some feedback
Torbjörn Granlund
tg at gmplib.org
Tue May 12 21:19:24 UTC 2020
Aaron Kutch <aaronkutch at att.net> writes:
I have spent much time optimizing a set of division algorithms for
Rust’s `compiler-builtins` (a pure Rust version of LLVM’s
`compiler-rt`). I am emailing to gmp-discuss because I think my 128
bit `_trifecta` and `asymmetric` algorithms could act as good base
cases for division functions in GMP, and because I don’t know any
group of people other than those in GMP who could better review my
algorithms. I ran a simple benchmark of dividing a random 128 bit
integer with a 64 bit random integer and found that GMP took 21
microseconds to calculate it using `Integer::div_rem_mut` (using the
Rust `rug` crate which binds to GMP), while my
`specialized_div_rem::u128_div_rem_asymmetric` function took only 1.2
microseconds. I understand that my function deals with integers on the
stack and has fewer function call overheads, but division is such an
expensive operation that I would not expect much difference if the
internal algorithms performed similarly. The Git repo for my
algorithms is https://github.com/AaronKutch/specialized-div-rem, the
crates.io link is https://crates.io/crates/specialized-div-rem, and
the in-progress PR for `compiler-builtins` is
https://github.com/rust-lang/compiler-builtins/pull/332.
I can assure you that a 128/64-bit division on any modern 64-bit
processor does not take anywhere near 21 microseconds using GMP. That
would be some 50,000 cycles. Actually, it is perhaps 500 times faster
than that if you call GMP directly.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-discuss
mailing list