udiv_qr_3by2 vs divappr

Mon Aug 27 21:47:30 UTC 2018

nisse at lysator.liu.se (Niels Möller) writes:

  paul zimmermann <Paul.Zimmermann at inria.fr> writes:

  > page 1: the division instruction is now much faster than before on modern
  >      processors

  According to https://gmplib.org/~tege/x86-timing.pdf, they're still an
  order of magnitute slower than multiplication. E.g 86 vs 3 cycles on
  Intel skylake. And in addition, multiplication is piplined, division
  likely isn't.

Last time I explored this, the division instructions (div/idiv) ran at
varying speed as long as the upper dividend word is zero.  When it is,
the timing is rougly linear in either log2(dividend) or log2(quotient).
But the fastst time is still some 10 cycles, IIRC.  And as Nisse says,
it is not pipelined.

The relative throughput between mul and div can still be on the order of
10x in the best case and 100x in the worst case.

-- 
Torbjörn
Please encrypt, key id 0xC8601622