fast inversion
bodrato at mail.dm.unipi.it
bodrato at mail.dm.unipi.it
Sun Apr 26 20:53:49 UTC 2015
Ciao Paul!
Sorry for the long delay, I've been off-line for a long while...
Il Mar, 3 Febbraio 2015 7:09 pm, paul zimmermann ha scritto:
> I did update my notes and code about fast inversion (which were used as
> basis for Algorithm 3.5 ApproximateReciprocal in "Modern Computer
Great news!
> The main change is that now the result is uniquely defined
> Comments are welcome.
After a first glance to the code, two lines surprise me:
mpn_com_n (tp, tp, n);
mpn_add_1 (tp, tp, n, ONE);
I wondered why you didn't use
mpn_neg_n (tp, tp, n);
Then I tested (on shell at gmplib) and...
@shell ~/gmp-repo$ tune/speed -s 1-1030 -f 2 -c mpn_neg mpn_com
mpn_add_1_inplace.1
overhead 6.78 cycles, precision 10000 units of 2.86e-10 secs, CPU freq
3500.08 MHz
mpn_neg mpn_com mpn_add_1_inplace.1
1 #5.68 12.54 6.80
2 9.40 13.65 #8.19
4 16.25 11.40 #8.22
8 31.56 16.01 #6.84
16 61.86 25.10 #8.16
32 139.01 44.79 #6.80
64 248.18 85.51 #8.20
128 472.77 206.21 #8.38
256 918.75 372.29 #8.21
512 1915.83 731.53 #6.87
1024 3689.67 1472.14 #8.29
...of course you are right. On many architectures we HAVE_NATIVE_mpn_com,
which is faster than the C loop when sizes grow.
On the note side:
The mpn_invert function in GMP currently uses the ApproximateReciprocal
with a final check-and-possibly-correct step triggered by a check of a
single limb. The main difference with respect to your note is that this
step is not performed for all recursion levels, but only once.
We have two functions: mpn_invert calls mpn_invertappr, the latter returns
a boolean; with your code, something like
return (up[2*h-l-1] + 3 <= CNST_LIMB(2)); /* X might be off by 1 */
And the former perform the final check if the return value is true.
The other difference is that you suggest to use plain multiplication or
the product modulo B^nm+1. On our side we use multiplication modulo B^nm
(to be honest we only truncate linear operations) or wraparound modulo
B^nm-1. I believe the results are the same obtained by the steps you
suggest. And I probably should change the CNST_LIMB(7) used in the last
check with some stricter value.
We should try if tune/speed is able to detect some speed difference
between the two implementations, probably not...
Best regards,
Marco
--
http://bodrato.it/
More information about the gmp-devel
mailing list