bdiv vs redc

Wed Jun 27 12:31:40 CEST 2012

nisse at lysator.liu.se (Niels Möller) writes:

  Let us review the differences between bdiv and redc, and focus on the
  balanced case (since that's what redc uses).

Good idea!

  We have an odd modulo M of size n limbs, and a numerator U of size 2n
  limbs.

  bdiv computes

    Q = U M^-1 (mod B^n)
    R = B^{-n} (U - Q M)

  where Q is n limbs, and R is n limbs plus a bit, it's in the range

    -M < R < B^n

  The redc functions in gmp used to generate an R of n limbs and a Q of n
  limbs and a bit, so we had quite a bit of impedance mismatch. But with
  the current code, it seems that the redc functions compute

    Q = - U M^{-1} (mod B^n)
    R = B^{-n} (U + Q M)

  where Q is n limbs (but not returned), and R is n limbs plus a bit,

    0 <= R < B^n + M

  The redc functions return a carry, while the bdiv functions return a
  borrow.

I must admit that I didn't have this completely clear in my mind.

  I think the reason for the redc changes was to let the caller decide if
  the final conditional subtraction should be done unsing a constant time
  method (powm_sec) or faster but with data-dependent timing (regular
  powm).

Yes.  (And in some situation, the subtraction is not needed at all.
Paul Zimmermann told me that the GMP-ECM code does not need it.  It
might be possible to do without it also in powm.)

  But this means that if we were to implement a bdiv_r consistent with the
  other bdiv functions, one could replace

    cy = mpn_redc_* (rp, up, mp, n, ...)
    if (cy != 0)
      mpn_sub_n (rp, rp, mp, n)

  by

    cy = mpn_bdiv_r (rp, up, 2*n, mp, n, ...));
    if (cy != 0)
      mpn_add_n (rp, rp, mp, n);

What you're suggesting, is, I suppose:

1. Implement mpn_bdiv_r (at SB level, DC level, MU level)

2. Get rid of specific mpn_redc_* and use mpn_bdiv_r instead

I think this is a good idea.  We need to carefully watch for slowdown.
We need to generalise mpn/x86_64/redc_1.asm to become a
mpn_sbpi1_bdiv_r.  (I have several other asm redc_1 and recd_2 which are
not yet pushed.)

Here is a slight disadvantage of getting rid of redc: mpn_sbpi1_bdiv_r
will necessarily be more complex, probably using two outer loops than
mpn_redc_1.  This matters for asm implementations.

  That would be nice cleanup, I think. Now there are a couple of other
  differences between the bdiv and redc interfaces:

  1. The bdiv interface places the remainder at the high end of the U
     area, while redc arranges to put it at the low end. Can or should we
     change that? I guess it's possible to use a different convention for
     a new bdiv_r than for bdiv_qr. I guess the currrent bdiv_qr
     convention is beneficial for the dc routines, but I haven't looked at
     that code recently.

It would be trivial to fix at least mpn_sbpi1_bdiv_qr to pt the
remainder in the low end.  It probably makes some sense.  Or perhaps we
allow the quotient to be stored there?

  2. Obviously, the bdiv functions return the quotient. It seems
     mpn_sbpi1_bdiv_qr uses a loop very similar to redc, with addmul_1
     rather than submul_1. And then there's some logic to negate the
     quotient at the end; all that can be omitted for bdiv_r.

Indeed.

  3. There's a redc_2 function, but no pi2_bdiv functions.

That ought to be fixed (irrespective of the suggested merge).

  4. There's a dc_bdiv function, but no redc_dc. After some analysis, I
     think there's a place for redc_dc, even if we assume that the inverse
     computation for redc_n is free.

This is indeed an omission.

  5. There's a mu_bdiv function, using a smaller inverse than redc_n,
     which is the closest corresponding function on the redc side. I guess
     it makes sense to always use a full inverse for redc, given that it's
     used mainly when the modulo is invariant?

There should be a mpn_preinv_mu_bdiv_qr (we have the analogous Euclidian
division function).  That would cover the current redc_n, but also allow
for smaller pre-inverses for when the divisor/modulus invariance is
smaller.

-- 
Torbjörn