udiv_qr_3by2 vs divappr
tg at gmplib.org
Mon Aug 27 21:47:30 UTC 2018
nisse at lysator.liu.se (Niels Möller) writes:
paul zimmermann <Paul.Zimmermann at inria.fr> writes:
> page 1: the division instruction is now much faster than before on modern
According to https://gmplib.org/~tege/x86-timing.pdf, they're still an
order of magnitute slower than multiplication. E.g 86 vs 3 cycles on
Intel skylake. And in addition, multiplication is piplined, division
Last time I explored this, the division instructions (div/idiv) ran at
varying speed as long as the upper dividend word is zero. When it is,
the timing is rougly linear in either log2(dividend) or log2(quotient).
But the fastst time is still some 10 cycles, IIRC. And as Nisse says,
it is not pipelined.
The relative throughput between mul and div can still be on the order of
10x in the best case and 100x in the worst case.
Please encrypt, key id 0xC8601622
More information about the gmp-devel