tg at gmplib.org
Tue Sep 3 11:17:33 UTC 2019
tg at gmplib.org (Torbjörn Granlund) writes:
nisse at lysator.liu.se (Niels Möller) writes:
In that case, not so surprising that the div1 function loses. Do other
architectures also have decent performance for small-quotient division?
I don't have the full picture, I'm afraid.
I know several ARM cores have great division performance for small
quotients. For x86 I know of cores with horrible performance and ones
(like Haswell and later) with half decent performance. I assume newer
AMD cores got this right.
I ran tests of shell (Intel Ivy bridge, from around 2012) and ashell
(AMD Ryzen 2700X from 2018) with this simple program:
unsigned long qs;
unsigned long r, i;
for (r = 0; r < CLOCK/1000; r++)
for (i = 0; i < 1000; i++)
qs[i] = 2000 / (i + 1000);
The Intel system reports ~23 cycles per division, the AMD system reports
ARM systems impress more, a73 gives 5 cycles/division, a72 gives 6.
Even a low-end a53 gives 5. (The many ARM systems are always on,
they're hiding behind ashell.)
So I think plain / is the way to go for certain systems!
Please encrypt, key id 0xC8601622
More information about the gmp-devel