bdiv_q_2.c improved
Torbjörn Granlund
tg at gmplib.org
Thu Oct 8 08:49:28 UTC 2015
Joe keane <jgk at panix.com> writes:
I need to use carry/add-with-carry somehow. My 'solution' is to code the
whole thing in assembler, but maybe that is not necessary.
I agree that this is the type of function that would make sense in
assembly.
>You might want to experiment with using a two-limb inverse.
That's more complicated. :-)
We'd want several functions...
Function: bdiv_qr_1_pi1 alias divexact_1
Divisor size = 1, inverse size = 1. We have this function in C and
assembly, but we call it mpn_divexact_1 for historical reasons.
Function: bdiv_qr_1_pi2
Divisor size = 1, inverse size = 2. This will allow more micro-
parallelism than bdiv_qr_1_pi1 and should be faster for dividends of
more than a few limbs.
If the divisor is invariant over several dividends, bdiv_qr_1_pi2 should
beat bdiv_qr_1_pi1 always.
We have this function (see https://gmplib.org/devel/asm.html) but it
seems no code is in the repo just yet. Not even the C code is there.
Function: bdiv_qr_2_pi1
Function: bdiv_qr_2_pi2
Similar situation here. The _pi2 variant will be somewhat faster for
almost every CPU. The inversion computation will be higher and push up
the break-even point when the divisor is not invariant.
(We might want two loop variants for each of these functions, one which
shifts the dividend on-the-fly and one which does not...)
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list