hgcd1/2

Torbjörn Granlund tg at gmplib.org
Tue Sep 3 13:22:36 UTC 2019


I tested newer Intel systems too (Haswell, Skylake) and they all need
around 25 cycles for a division n/d = 1.

Intel Goldmont Plus (a current low-end CPU) is better, it needs about 12
cycles.  AMD CPUs from the last 10 years all perform OK.

It is funny that x86 vendors give division so little thought.  ARM
clearly got it right.  I mean, doing SRT for just the non-zero part of
the quotient cannot be very hard!

(ARM processors before a77 have very poor multiplication, though.)

AMD bd1     22
AMD bd2     15
AMD bd4     15
AMD zn1	    14
AMD zn2	    14
AMD bt2     13
Intel hwl   25
Intel sky   25
Intel slm   30
Intel glm   13
Intel glm+  12

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list