bdiv vs redc

Thu Jun 28 12:13:30 CEST 2012

  > It would be trivial to fix at least mpn_sbpi1_bdiv_qr to pt the
  > remainder in the low end.  It probably makes some sense.

  It doesn't look entirely trivial to me, in the unbalanced case. In
  general, of one does a redc in several steps (in the unbaanced qn > dn
  case, the schoolbook (sb) code uses a loop reducing dn limbs at a time,
  and the divide-and-conquer (dc) code also works in several iterations),
  I think it will be inconvenient to get the remainder anywhere but at the
  high end of the U area.

I see a final sub_n which looks like it could be adjusted to make the
remainder end up in the low end.

  One simple way would be to define mpn_bdiv_r_n and mpn_bdiv_qr_n
  (balanced division) with an additional rp input. This pointer would be
  allowed to point at the low or high half of U (or to any non-overlapping
  area). This would work fine at least for sb, not sure how easy it is for
  dc. This could be a common building block for both redc and unbalanced
  bdiv.

  > Or perhaps we allow the quotient to be stored there?

  No. That area is used for temporary storage for the addmul_1 carry
  limbs. And this optimization is also the reason why the sb code operates
  in blocks of size dn: After doing dn limbs, adding in the the carry
  limbs cannot be delayed any longer.

A comment says:
  /* FIXME: Add ASSERTs for allowable overlapping; i.e., that qp = np is OK,
     but some over N/Q overlaps will not work.  */
I think we changed this during last year.  Putting the quotient in that
place is a workaround for callers in the absence lf an _r function variant.

Are we talking about different functions?

It would be possible to remove the qp = np feature from mpn_sbpi1_bdiv_qr
if we added a mpn_sbpi1_bdiv_r.  That would allow us to return the
remainder at the low end.  But I have no informed opinion about where it
would be best to put the remainder.

-- 
Torbjörn