Some div_qr_2 assembler

Torbjorn Granlund tg at
Tue Mar 22 14:28:48 CET 2011

nisse at (Niels Möller) writes:

  I've written a x86_64 loop for mpn_div_qr_2_pi1_norm, using 3/2
  division. I had difficulty understanding the related assembler
  implementation of divrem_2, so I wrote the new function from scratch.
Ehum.  The only difference is that the qxn argument is not supported in
mpn_div_qr_2_pi1_norm, right?

So removing the handling of that, without understanding every detail of
the code, seems like better use of time than start from scratch, and
try to make slower code run as fast as the already optimised code.

Remove these insns:

        cmp     %r14, %r13              C
        jg      L(19)                   C
        mov     (%r12), %r10            C
        sub     $8, %r12                C
L(19):  sub     %r8, %r10               C               ncp

This might also save a cycle or two.  The old innerloop then becomes
several insns shorted than the new innerloop.

Rearranging parameters (since no qxn) is slightly more painful.


More information about the gmp-devel mailing list