udiv_qr_3by2 vs divappr

Torbjörn Granlund tg at gmplib.org
Tue Aug 28 06:28:27 UTC 2018


Some more measurements.  The first 3 result columns are for division,
for gradually harder cases; B=0xcafebabedeadbeef.  The last columns is
widening multiplication.

cpu type         17/3  B/4711  2^64/3   mul     presumed algorithm
skylake           25     25       86     1      SRT-8
broadwell         26     25       82     1      SRT-8
haswell           27     27       78     1      SRT-8
sandybridge       29     29       82     1      SRT-8
nehalem           20     22       72     2      SRT-8
penryn            20     21       57     4      SRT-8
nocona            54     54      149    11      non-restoring (or SRT-4?)
zen+              15     39       45     2      SRT-4
excavator         16     67       80     5      early-out non-restoring
piledriver        14     62       74     4      early-out non-restoring
k10               20     79       79     2      early-out non-restoring
goldmont          13     36       42     3      early-out SRT-4
silvermont        29     42       98     4      early-out SRT-4
diamondville     130    130      138    18      compare, subtract

We see that AMD and not Intel do early-out division, making small
quotient cases run quite fast on AMD.  Only Intel's low-power chips
goldmont and silvermont (mostly marketed under the Atom brand) do
early-out.

Intel handles the case where the dividend < 2^64 specially.  AMD seems
to let their early-out work all the way, with no sudden performance drop
at 2^64.

It's curious that Intel's best division hardware is found in penryn (aka
core 2) and recent Atom chips.

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list