udiv_qr_3by2 vs divappr
Torbjörn Granlund
tg at gmplib.org
Tue Aug 28 06:28:27 UTC 2018
Some more measurements. The first 3 result columns are for division,
for gradually harder cases; B=0xcafebabedeadbeef. The last columns is
widening multiplication.
cpu type 17/3 B/4711 2^64/3 mul presumed algorithm
skylake 25 25 86 1 SRT-8
broadwell 26 25 82 1 SRT-8
haswell 27 27 78 1 SRT-8
sandybridge 29 29 82 1 SRT-8
nehalem 20 22 72 2 SRT-8
penryn 20 21 57 4 SRT-8
nocona 54 54 149 11 non-restoring (or SRT-4?)
zen+ 15 39 45 2 SRT-4
excavator 16 67 80 5 early-out non-restoring
piledriver 14 62 74 4 early-out non-restoring
k10 20 79 79 2 early-out non-restoring
goldmont 13 36 42 3 early-out SRT-4
silvermont 29 42 98 4 early-out SRT-4
diamondville 130 130 138 18 compare, subtract
We see that AMD and not Intel do early-out division, making small
quotient cases run quite fast on AMD. Only Intel's low-power chips
goldmont and silvermont (mostly marketed under the Atom brand) do
early-out.
Intel handles the case where the dividend < 2^64 specially. AMD seems
to let their early-out work all the way, with no sudden performance drop
at 2^64.
It's curious that Intel's best division hardware is found in penryn (aka
core 2) and recent Atom chips.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list