Paul.Zimmermann at inria.fr
Mon Apr 27 14:37:12 UTC 2015
thank you Marco for your feedback. At the time I wrote that code I'm not sure
mpn_neg did exist... Please tell me if you find a faster version of the code.
> Date: Sun, 26 Apr 2015 22:53:49 +0200 (CEST)
> From: bodrato at mail.dm.unipi.it
> Ciao Paul!
> Sorry for the long delay, I've been off-line for a long while...
> Il Mar, 3 Febbraio 2015 7:09 pm, paul zimmermann ha scritto:
> > I did update my notes and code about fast inversion (which were used as
> > basis for Algorithm 3.5 ApproximateReciprocal in "Modern Computer
> Great news!
> > The main change is that now the result is uniquely defined
> > Comments are welcome.
> After a first glance to the code, two lines surprise me:
> mpn_com_n (tp, tp, n);
> mpn_add_1 (tp, tp, n, ONE);
> I wondered why you didn't use
> mpn_neg_n (tp, tp, n);
> Then I tested (on shell at gmplib) and...
> @shell ~/gmp-repo$ tune/speed -s 1-1030 -f 2 -c mpn_neg mpn_com
> overhead 6.78 cycles, precision 10000 units of 2.86e-10 secs, CPU freq
> 3500.08 MHz
> mpn_neg mpn_com mpn_add_1_inplace.1
> 1 #5.68 12.54 6.80
> 2 9.40 13.65 #8.19
> 4 16.25 11.40 #8.22
> 8 31.56 16.01 #6.84
> 16 61.86 25.10 #8.16
> 32 139.01 44.79 #6.80
> 64 248.18 85.51 #8.20
> 128 472.77 206.21 #8.38
> 256 918.75 372.29 #8.21
> 512 1915.83 731.53 #6.87
> 1024 3689.67 1472.14 #8.29
> ...of course you are right. On many architectures we HAVE_NATIVE_mpn_com,
> which is faster than the C loop when sizes grow.
> On the note side:
> The mpn_invert function in GMP currently uses the ApproximateReciprocal
> with a final check-and-possibly-correct step triggered by a check of a
> single limb. The main difference with respect to your note is that this
> step is not performed for all recursion levels, but only once.
> We have two functions: mpn_invert calls mpn_invertappr, the latter returns
> a boolean; with your code, something like
> return (up[2*h-l-1] + 3 <= CNST_LIMB(2)); /* X might be off by 1 */
> And the former perform the final check if the return value is true.
> The other difference is that you suggest to use plain multiplication or
> the product modulo B^nm+1. On our side we use multiplication modulo B^nm
> (to be honest we only truncate linear operations) or wraparound modulo
> B^nm-1. I believe the results are the same obtained by the steps you
> suggest. And I probably should change the CNST_LIMB(7) used in the last
> check with some stricter value.
> We should try if tune/speed is able to detect some speed difference
> between the two implementations, probably not...
> Best regards,
More information about the gmp-devel