More div_qr_2 assembler

Fri Apr 1 17:58:53 CEST 2011

nisse at lysator.liu.se (Niels Möller) writes:

  1. There's an extra function call (speed mpn_div_qr_2n calls
     mpn_div_qr_2 which reads d1, d0, checks the high bit, computes the
     inverse, and then calls mpn_div_qr_2n_pi1).

I see.  When writing in assembly, we should probably provide
mpn_div_qr_2n as one entry point and mpn_div_qr_2n_pi1 as another, with
just a plain jmp to common code.

  2. Both call invert_limb to compute a 2/1 inverse, but the adjustments
     to a 3/2 inverse is done in assembler in divrem_2, while div_qr_2
     uses the C macro invert_pi1.

That would be fixed by the dual entry point trick.

  3. The argument list is longer for mpn_div_qr_2n_pi1 than for divrem_2, 

               (mp_ptr qp, mp_ptr rp, 
                mp_srcptr np, mp_size_t nn,
  	      mp_limb_t d1, mp_limb_t d0, mp_limb_t di)
     vs
               (mp_ptr qp, mp_size_t qxn,
  	      mp_ptr np, mp_size_t nn,
  	      mp_srcptr dp)

     The di argument is passed on the stack for x86_64. And passing rp
     means that one additional register must be saved and restored.

Oh, but shouldn't d1,d0 be passed just like eny other 2-limb operand; as
a pointer to its low limb?

-- 
Torbjörn