I have been constructing division algorithms and want some feedback

Torbjörn Granlund tg at gmplib.org
Tue May 12 21:19:24 UTC 2020


Aaron Kutch <aaronkutch at att.net> writes:

  I have spent much time optimizing a set of division algorithms for
  Rust’s `compiler-builtins` (a pure Rust version of LLVM’s
  `compiler-rt`). I am emailing to gmp-discuss because I think my 128
  bit `_trifecta` and `asymmetric` algorithms could act as good base
  cases for division functions in GMP, and because I don’t know any
  group of people other than those in GMP who could better review my
  algorithms. I ran a simple benchmark of dividing a random 128 bit
  integer with a 64 bit random integer and found that GMP took 21
  microseconds to calculate it using `Integer::div_rem_mut` (using the
  Rust `rug` crate which binds to GMP), while my
  `specialized_div_rem::u128_div_rem_asymmetric` function took only 1.2
  microseconds. I understand that my function deals with integers on the
  stack and has fewer function call overheads, but division is such an
  expensive operation that I would not expect much difference if the
  internal algorithms performed similarly. The Git repo for my
  algorithms is https://github.com/AaronKutch/specialized-div-rem, the
  crates.io link is https://crates.io/crates/specialized-div-rem, and
  the in-progress PR for `compiler-builtins` is
  https://github.com/rust-lang/compiler-builtins/pull/332.

I can assure you that a 128/64-bit division on any modern 64-bit
processor does not take anywhere near 21 microseconds using GMP.  That
would be some 50,000 cycles.  Actually, it is perhaps 500 times faster
than that if you call GMP directly.

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-discuss mailing list