hgcd1/2
Torbjörn Granlund
tg at gmplib.org
Sun Sep 15 12:30:01 UTC 2019
Two configs hesitantly chose HGCD2_DIV1_METHOD = 2. For k10/64, method
2 outperforms method 3 by 0.34%. For ARM Cortex-A8 method 2's advantage
is 1.94%. Wow.
Looking at when method 1 or 3 is faster than 2 is more interesting.
Method 1 and to some extent also method 3 would benefit from asm code,
so unless they are beaten with some margin, they might be the most sound
algorithms for more configs.
When method 2 is beaten, it is always by method 3, and then always by
lower single-digit percent.
Questions for Niels:
Would your present tuneup/speed setup allow measuring of asm code?
The current div1 measurements include hgcd2's own time, right? I.e., if
we found a div1 which runs in zero cycles, the timings would not be
zero.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list