More div_qr_2 assembler
tg at gmplib.org
Mon Apr 4 14:09:41 CEST 2011
nisse at lysator.liu.se (Niels Möller) writes:
> I see. When writing in assembly, we should probably provide
> mpn_div_qr_2n as one entry point and mpn_div_qr_2n_pi1 as another, with
> just a plain jmp to common code.
An alternative is to have an assembly inversion function, analogous to
the mod_1_*_cps functions.
Sure, this is a good candidate for assembly implementation.
We have discussed earlier to have separate,
preferably short, functions for _qr, _q and _r. And also the
unnormalized functions would use an inverse of the same type, just with
shifted input. So if we do dual entry points for all these functions,
the inversion code will be duplicated in half a dozen places.
Such duplication would be unfortunate, but if it gives a significant
performance boost, then we should consider it nevertheless.
I think the name mpn_invert_3by2 is reasonable, do you agree?
BTW, for assembly implementation, I wonder if one could have invert_limb
have the remainder as a second return value, put in some other register
than %rax. I think one may be able to save a multiply in
mpn_invert_3by2 (assuming it calls invert_limb, rather than copies that
Are you suggesting that a separate assembly implementation of
mpn_invert_3by2 should call invert_limb? Should we not put it inline?
Returing extra bits in a register might work, but we need to consult the
ABI (ELF). It is possible that glue code is generated (at least for
shared libs) that could clobber some of the registers.
More information about the gmp-devel