udiv_qr_3by2 vs divappr
Torbjörn Granlund
tg at gmplib.org
Mon Aug 27 21:47:30 UTC 2018
nisse at lysator.liu.se (Niels Möller) writes:
paul zimmermann <Paul.Zimmermann at inria.fr> writes:
> page 1: the division instruction is now much faster than before on modern
> processors
According to https://gmplib.org/~tege/x86-timing.pdf, they're still an
order of magnitute slower than multiplication. E.g 86 vs 3 cycles on
Intel skylake. And in addition, multiplication is piplined, division
likely isn't.
Last time I explored this, the division instructions (div/idiv) ran at
varying speed as long as the upper dividend word is zero. When it is,
the timing is rougly linear in either log2(dividend) or log2(quotient).
But the fastst time is still some 10 cycles, IIRC. And as Nisse says,
it is not pipelined.
The relative throughput between mul and div can still be on the order of
10x in the best case and 100x in the worst case.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list