Some div_qr_2 assembler
tg at gmplib.org
Tue Mar 22 14:28:48 CET 2011
nisse at lysator.liu.se (Niels Möller) writes:
I've written a x86_64 loop for mpn_div_qr_2_pi1_norm, using 3/2
division. I had difficulty understanding the related assembler
implementation of divrem_2, so I wrote the new function from scratch.
Ehum. The only difference is that the qxn argument is not supported in
So removing the handling of that, without understanding every detail of
the code, seems like better use of time than start from scratch, and
try to make slower code run as fast as the already optimised code.
Remove these insns:
cmp %r14, %r13 C
jg L(19) C
mov (%r12), %r10 C
sub $8, %r12 C
L(19): sub %r8, %r10 C ncp
This might also save a cycle or two. The old innerloop then becomes
several insns shorted than the new innerloop.
Rearranging parameters (since no qxn) is slightly more painful.
More information about the gmp-devel