More div_qr_2 assembler

Niels Möller nisse at
Fri Apr 1 21:33:35 CEST 2011

Torbjorn Granlund <tg at> writes:

> I see.  When writing in assembly, we should probably provide
> mpn_div_qr_2n as one entry point and mpn_div_qr_2n_pi1 as another, with
> just a plain jmp to common code.

An alternative is to have an assembly inversion function, analogous to
the mod_1_*_cps functions. We have discussed earlier to have separate,
preferably short, functions for _qr, _q and _r. And also the
unnormalized functions would use an inverse of the same type, just with
shifted input. So if we do dual entry points for all these functions,
the inversion code will be duplicated in half a dozen places.

I think the name mpn_invert_3by2 is reasonable, do you agree?

BTW, for assembly implementation, I wonder if one could have invert_limb
have the remainder as a second return value, put in some other register
than %rax. I think one may be able to save a multiply in
mpn_invert_3by2 (assuming it calls invert_limb, rather than copies that
code inline).

> Oh, but shouldn't d1,d0 be passed just like eny other 2-limb operand; as
> a pointer to its low limb?

That might be better, yes. They're already read into registers, for the
inverse computation, but it's no optimization to pass them as separate
arguments if that causes additional stores to the stack for argument

For the 2u functions, I think the argument list should be shortened by
putting the normalized divisor limbs, the inverse, and the shift count,
into a struct and pass a pointer to that. Like the extended gmp_pi1_t we
discussed the other day, in the context of schoolbook division.


Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list