Torbjörn Granlund tg at gmplib.org
Sun Sep 15 12:30:01 UTC 2019

Two configs hesitantly chose HGCD2_DIV1_METHOD = 2.  For k10/64, method
2 outperforms method 3 by 0.34%.  For ARM Cortex-A8 method 2's advantage
is 1.94%.  Wow.

Looking at when method 1 or 3 is faster than 2 is more interesting.
Method 1 and to some extent also method 3 would benefit from asm code,
so unless they are beaten with some margin, they might be the most sound
algorithms for more configs.

When method 2 is beaten, it is always by method 3, and then always by
lower single-digit percent.

Questions for Niels:

Would your present tuneup/speed setup allow measuring of asm code?

The current div1 measurements include hgcd2's own time, right?  I.e., if
we found a div1 which runs in zero cycles, the timings would not be

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list