bdiv_q_2.c improved

Torbjörn Granlund tg at gmplib.org
Thu Oct 8 08:49:28 UTC 2015


Joe keane <jgk at panix.com> writes:

  I need to use carry/add-with-carry somehow.  My 'solution' is to code the
  whole thing in assembler, but maybe that is not necessary.
  
I agree that this is the type of function that would make sense in
assembly.

  >You might want to experiment with using a two-limb inverse.
  
  That's more complicated. :-)

We'd want several functions...


Function: bdiv_qr_1_pi1 alias divexact_1

Divisor size = 1, inverse size = 1.  We have this function in C and
assembly, but we call it mpn_divexact_1 for historical reasons.


Function: bdiv_qr_1_pi2

Divisor size = 1, inverse size = 2.  This will allow more micro-
parallelism than bdiv_qr_1_pi1 and should be faster for dividends of
more than a few limbs.

If the divisor is invariant over several dividends, bdiv_qr_1_pi2 should
beat bdiv_qr_1_pi1 always.

We have this function (see https://gmplib.org/devel/asm.html) but it
seems no code is in the repo just yet.  Not even the C code is there.


Function: bdiv_qr_2_pi1
Function: bdiv_qr_2_pi2

Similar situation here.  The _pi2 variant will be somewhat faster for
almost every CPU.  The inversion computation will be higher and push up
the break-even point when the divisor is not invariant.


(We might want two loop variants for each of these functions, one which
shifts the dividend on-the-fly and one which does not...)

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list