Division with on-the-fly normalization

Fri Mar 18 11:39:08 CET 2011

nisse at lysator.liu.se (Niels Möller) writes:

  I'm trying to figure out what is needed to integrate division code with
  on-the-fly normalization. I think the following should be done:

  * Extend gmp_pi1_t to something like

    typedef struct {
      /* Normalized most significant divisor limbs */
      mp_limb_t dh[3];
      mp_limb_t inv32;
      int shift;
    } gmp_pi1_t;

OK, so you're suggestng that one replicates the most significant part of
the divisor here?  I assume you want to keep it normalised, i.e., the
msb of dh[2] = 1...

  * Rename current invert_pi1 to invert_3by2.

  * Write a new invert_pi1 which normalizes d, uses invert_3by2, and fills
    in the other fields. Needs three limbs of d as input (pointer or
    separate arguments?). Or maybe it's getting too large for a macro, so
    it should be a function instead?

You're suggesting that we use the 5 word structure also for the case
where the divisor is already normalised, up from 1 word today?

At the lowest mpn level, I think we should probably have one loop per
function.  So we should have different functions for plain SB division
for the case where operands will be shifted on-the-fly and for the case
where they do not.  I don't thing the latter case should be burdened
down by a 5-word structure, since filling that out will take a few extra
cycles, and a few cycles matter for SB division.

Do you agree?

  * Change mpn_sbpi1_div_qr and friends to take a gmp_pi1_t * as argument
    (or it should really be const gmp_pi1_t *, right?)

  * Update callers to the new conventions.

  At this stage, the code ought to compile again. Next:

  * Add actual support for the unnormalized case (gmp_pi1_t->shift > 0) to
    one sb function at a time, and remove the normalization shifting in
    callers.

  * Fix any other normalization dependences, beyond the
    schoolbook functions. Are there any known dependencies, e.g.,
    in mpn_mu_div_*, and the inversion functions?

I am not sure I understand this question.  In general, the lowest-level
division functions today require normalisation, SB, DC, and MU.

But mpn_div_q does not require normalisation, since it is not
"lowest-level".  If the quotient is FUDGE smaller than the divisor, it
normalises the upper parts of the dividend and divisor, else all of
them.

While I agree that we should add SB division that does not require
normalisation, I am not so sure there is much point in rewriting the
larger-operand division code to handle unnormalised divisors.  For these
larger operands, the potential saved time would be a very small fraction
of the total time.  For SB division on the other hand, there should be
significant savings possible.

-- 
Torbjörn