mpn_invert implementing ApproximateReciprocal.
bodrato at mail.dm.unipi.it
bodrato at mail.dm.unipi.it
Fri Dec 11 14:55:28 CET 2009
Ciao,
I finally had some time to implement and test a version of mpn_invert
using Newton iteration, the code is attached.
I copy here what I wrote in the comments:
/*
Inspired by Algorithm "ApproximateReciprocal", published in
"Modern Computer Arithmetic" by Richard P. Brent and Paul Zimmermann,
algorithm 3.5, page 121 in version 0.4 of the book.
Some adaptation were introduced, to allow product mod B^m-1.
The iterative structure is copied from T.Granlund's binvert.c.
WARNING: This function compute an "approximate" inverse, this means
that sometimes the result will be one less than expected.
FIXME: the scratch for mulmod_bnm1 does not currently fit in the
scratch, it is allocated apart.
With USE_MUL_N = 0 and WRAP_MOD_BNM1 = 0, the iteration is conformant
to the algorithm described in the book.
USE_MUL_N = 1 introduce a correction in such a way that "the value of
B^{n+h}-T computed at step 8 cannot exceed B^n-1" (the book reads
"2B^n-1"). This correction should not require correction in the proof.
WRAP_MOD_BNM1 = 1 enables the wrapped product modulo B^m-1.
NOTE: is there any normalisation problem for the [0] class?
It shouldn't: we compute 2*|A*X_h - B^{n+h}| < B^m-1.
We may get [0] if and only if we get AX_h = B^{n+h}.
This can happen only if A=B^{n}/2, but this implies X_h = B^{h}*2-1
i.e. AX_h = B^{n+h} - A, then we get into the "negative" branch,
where X_h is not incremented (because A < B^n).
*/
The code seems slightly faster with USE_MUL_N enabled.
I send you a very small table of times (cycles) used to compute the
inverse of a number with "size" limbs.
Tests performed on shell.gmplib (64-bit Core i7, IIRC).
size current Newton Newton+bnm1
50 14616 13152 13172
150 92584 75720 73368
250 209064 169736 157912
350 353652 277704 260520
450 535476 419720 387276
The column "current" gives the time of the function mpn_invert currently
available in the repository. "Newton" is the time for the code using
Newton's iteration. The last column is the same code with WRAP_MOD_BNM1=1.
It is worth using Newton's iteration from around 30 limbs, the speed-up
introduced by WRAP_MOD_BNM1 is barely noticeable below 150 limbs and
starts being relevant above 200. Note that within the same environment
MUL_FFT_MODF_THRESHOLD = 400, i.e. the product mod B^n+1 using FFT can not
give any advantage so early.
There are two tunable threshold for this code:
INV_NEWTON_THRESHOLD and INV_MULMOD_BNM1_THRESHOLD.
The file was coded as a replacement for the current mpn/generic/invert.c,
but maybe we prefer to call this function mpn_approxinvert ? Or something
like that?
Let me know,
Marco
--
http://bodrato.it/papers/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: invert.c
Type: text/x-csrc
Size: 7681 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20091211/a71222e1/attachment.bin>
More information about the gmp-devel
mailing list