bdiv_q_2.c improved

Torbjörn Granlund tg at
Thu Oct 8 08:49:28 UTC 2015

Joe keane <jgk at> writes:

  I need to use carry/add-with-carry somehow.  My 'solution' is to code the
  whole thing in assembler, but maybe that is not necessary.
I agree that this is the type of function that would make sense in

  >You might want to experiment with using a two-limb inverse.
  That's more complicated. :-)

We'd want several functions...

Function: bdiv_qr_1_pi1 alias divexact_1

Divisor size = 1, inverse size = 1.  We have this function in C and
assembly, but we call it mpn_divexact_1 for historical reasons.

Function: bdiv_qr_1_pi2

Divisor size = 1, inverse size = 2.  This will allow more micro-
parallelism than bdiv_qr_1_pi1 and should be faster for dividends of
more than a few limbs.

If the divisor is invariant over several dividends, bdiv_qr_1_pi2 should
beat bdiv_qr_1_pi1 always.

We have this function (see but it
seems no code is in the repo just yet.  Not even the C code is there.

Function: bdiv_qr_2_pi1
Function: bdiv_qr_2_pi2

Similar situation here.  The _pi2 variant will be somewhat faster for
almost every CPU.  The inversion computation will be higher and push up
the break-even point when the divisor is not invariant.

(We might want two loop variants for each of these functions, one which
shifts the dividend on-the-fly and one which does not...)

