div_qr_1 interface
Torbjorn Granlund
tg at gmplib.org
Tue Oct 22 16:14:34 CEST 2013
I played more with the code, now trying to break the add-adc-sbb-cmov
chain, for the benefit of most Intel processors.
But I lack unit testing code for the function, making hacking quite
cumbersome. I don't feel safe hacking *any* GMP assembly code without
tests/devel/try.c's function and access checks.
The changes I wanted to try was:
(1) Shorten a dep chain, and avoid keeping CF live over an inc
instruction. The cmov doesn't really depend on sbb, since the
latter insn never really changes carry. (This might btw be useful
to teach loppmixer!)
(2) Reallocate Q2 to an "old" register (not r8-r15) and then use the
32-bit form of "adc $0,reg". That form is shorter.
(3) Offet UP to avoid the offset in the loop. That form has longer load
latency for some Intel CPUs. Also try non-indexed form for QP and UP.
--
Torbjörn
More information about the gmp-devel
mailing list