div_qr_1 interface

Torbjorn Granlund tg at gmplib.org
Mon Oct 21 01:47:51 CEST 2013

I took a brief look at the loop of the new assembly code.

Have you analysed the register needs?  Pushing all callee-saves
registers is quite expensive.

For the mul insn, it is usually better to copy the invariant/noncritical
operand to rax, and use the critical operand explicitly in the mul insn.

I suspect one or two of the register-to-register copy insns could be
optimised out.

In order to run this through the loopmixer, you need to setup data in
the prologue which makes the adjustment branch to never be taken.
Letting the inverse be 0 or else B-1 might work...


More information about the gmp-devel mailing list