More div_qr_2 assembler
tg at gmplib.org
Fri Apr 1 17:58:53 CEST 2011
nisse at lysator.liu.se (Niels Möller) writes:
1. There's an extra function call (speed mpn_div_qr_2n calls
mpn_div_qr_2 which reads d1, d0, checks the high bit, computes the
inverse, and then calls mpn_div_qr_2n_pi1).
I see. When writing in assembly, we should probably provide
mpn_div_qr_2n as one entry point and mpn_div_qr_2n_pi1 as another, with
just a plain jmp to common code.
2. Both call invert_limb to compute a 2/1 inverse, but the adjustments
to a 3/2 inverse is done in assembler in divrem_2, while div_qr_2
uses the C macro invert_pi1.
That would be fixed by the dual entry point trick.
3. The argument list is longer for mpn_div_qr_2n_pi1 than for divrem_2,
(mp_ptr qp, mp_ptr rp,
mp_srcptr np, mp_size_t nn,
mp_limb_t d1, mp_limb_t d0, mp_limb_t di)
(mp_ptr qp, mp_size_t qxn,
mp_ptr np, mp_size_t nn,
The di argument is passed on the stack for x86_64. And passing rp
means that one additional register must be saved and restored.
Oh, but shouldn't d1,d0 be passed just like eny other 2-limb operand; as
a pointer to its low limb?
More information about the gmp-devel