hgcd1/2

Torbjörn Granlund tg at gmplib.org
Tue Sep 3 11:17:33 UTC 2019


tg at gmplib.org (Torbjörn Granlund) writes:

  nisse at lysator.liu.se (Niels Möller) writes:
>
    In that case, not so surprising that the div1 function loses. Do other
    architectures also have decent performance for small-quotient division?
>
  I don't have the full picture, I'm afraid.
>
  I know several ARM cores have great division performance for small
  quotients.  For x86 I know of cores with horrible performance and ones
  (like Haswell and later) with half decent performance.  I assume newer
  AMD cores got this right.

I ran tests of shell (Intel Ivy bridge, from around 2012) and ashell
(AMD Ryzen 2700X from 2018) with this simple program:

unsigned long qs[1000];
int
main ()
{
  unsigned long r, i;
  for (r = 0; r < CLOCK/1000; r++)
    {
      for (i = 0; i < 1000; i++)
	{
	  qs[i] = 2000 / (i + 1000);
	}
    }
  return 0;
}

The Intel system reports ~23 cycles per division, the AMD system reports
~13.

ARM systems impress more, a73 gives 5 cycles/division, a72 gives 6.
Even a low-end a53 gives 5.  (The many ARM systems are always on,
they're hiding behind ashell.)

So I think plain / is the way to go for certain systems!

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list