mpn_invert implementing ApproximateReciprocal.

Mon Dec 14 19:56:03 CET 2009

bodrato at mail.dm.unipi.it writes:

  > Marco, I would commit this code if I were you.

  I prefer not to commit, when I'm aware of a test failed by the new code.

I think that's a very wise principle!  :-)

  And "try mpn_invert" did not work with the approximated inversion
  replacing the current one.

Agreed.  If mon_invert had been made approximate, try.c would have
needed adjustments.

  > Perhaps we should have separate "mpn_invertappr" (following the naming
  > scheme of divappr) and mpn_invert.  But I'd suggest we just let this be
  > mpn_invert for now.

  I strongly prefer the "two separate" functions option, and I implemented it.

I prefer that too, in particular as you are doing the work.,,

  The code I'm going to commit provides, an "exact" function mpn_invert, and
  a faster-but-approximated one: mpn_invertappr (with its sub-functions _ni
  and _bc).

  I hope the included comment is clean enough in explaining the difference.

  /*
   All the three functions mpn{,_bc,_ni}_invertappr (ip, dp, n, scratch),
   take the strictly normalised value {dp,n} (i.e. most significant bit
   must be set) as an input, and compute {ip,n}: the approximate
   reciprocal of {dp,n}.

   Let e = mpn*_invertappr (ip, dp, n, scratch) be the returned value; the
   following conditions are satisfied by the output:
     0 <= e <= 1;
     {dp,n}*(B^n+{ip,n}) < B^{2n} <= {dp,n}*(B^n+{ip,n}+1+e) .
   I.e. e=0 means that the result {ip,n} equals the one given by mpn_invert.
        e=1 means that the result _may_ be one less than expected.

   The _bc version returns e=1 most of the times.
   The _ni version should return e=0 most of the time; only 1% of
   possible random input should give e=1.

   When the exact approximation is needed, i.e. e=0 in the relation above:
     {dp,n}*(B^n+{ip,n}) < B^{2n} <= {dp,n}*(B^n+{ip,n}+1) ;
   the function mpn_invert (ip, dp, n, scratch) should be used instead.
  */

"Exact approximation" is a bit odd.  The value computed is an
approximation (B^{2n}-1)/D, but is exactly floor((B^{2n}-1)/D).
I.e. no approximation compared to the ideal value.

  As far as I can see BINV_MULMOD_BNM1_THRESHOLD is always set to zero, this
  means that mulmod_bnm1 is always used when Newton is used... Great! But
  the problem is that the threshold for Newton was measured with the slower
  non-wrapping version of Newton.
  The measured BINV_NEWTON_THRESHOLD is probably too high because of this.

You're right.  Swapping the measurement order is also not right, as the
code is written.  I don't see how to do this easily, unless we assume
mpn_mul will never be relevant here.  I think I'll do that for now.

For mpn_binvert's usage if mul vs mulmod_bmn1 I think we could actually
use a more generally aiming parameter MUL_TO_MULMOD_BNM1_2NXN_THRESHOLD.
(The sub_1 used in binvert will not matter.)  I'll leave that for later.

Thanks a lot for the invert.c contribution!!

-- 
Torbjörn