fast inversion

Mon Apr 27 14:37:12 UTC 2015

thank you Marco for your feedback. At the time I wrote that code I'm not sure
mpn_neg did exist... Please tell me if you find a faster version of the code.

Paul

> Date: Sun, 26 Apr 2015 22:53:49 +0200 (CEST)
> From: bodrato at mail.dm.unipi.it
> 
> Ciao Paul!
> 
> Sorry for the long delay, I've been off-line for a long while...
> 
> Il Mar, 3 Febbraio 2015 7:09 pm, paul zimmermann ha scritto:
> > I did update my notes and code about fast inversion (which were used as
> > basis for Algorithm 3.5 ApproximateReciprocal in "Modern Computer
> 
> Great news!
> 
> > The main change is that now the result is uniquely defined
> 
> > Comments are welcome.
> 
> After a first glance to the code, two lines surprise me:
>       mpn_com_n (tp, tp, n);
>       mpn_add_1 (tp, tp, n, ONE);
> I wondered why you didn't use
>       mpn_neg_n (tp, tp, n);
> Then I tested (on shell at gmplib) and...
> 
> @shell ~/gmp-repo$ tune/speed -s 1-1030 -f 2 -c mpn_neg mpn_com
> mpn_add_1_inplace.1
> overhead 6.78 cycles, precision 10000 units of 2.86e-10 secs, CPU freq
> 3500.08 MHz
>               mpn_neg       mpn_com mpn_add_1_inplace.1
> 1               #5.68         12.54          6.80
> 2                9.40         13.65         #8.19
> 4               16.25         11.40         #8.22
> 8               31.56         16.01         #6.84
> 16              61.86         25.10         #8.16
> 32             139.01         44.79         #6.80
> 64             248.18         85.51         #8.20
> 128            472.77        206.21         #8.38
> 256            918.75        372.29         #8.21
> 512           1915.83        731.53         #6.87
> 1024          3689.67       1472.14         #8.29
> 
> ...of course you are right. On many architectures we HAVE_NATIVE_mpn_com,
> which is faster than the C loop when sizes grow.
> 
> 
> On the note side:
> 
> The mpn_invert function in GMP currently uses the ApproximateReciprocal
> with a final check-and-possibly-correct step triggered by a check of a
> single limb. The main difference with respect to your note is that this
> step is not performed for all recursion levels, but only once.
> 
> We have two functions: mpn_invert calls mpn_invertappr, the latter returns
> a boolean; with your code, something like
>     return (up[2*h-l-1] + 3 <= CNST_LIMB(2)); /* X might be off by 1 */
> And the former perform the final check if the return value is true.
> 
> 
> The other difference is that you suggest to use plain multiplication or
> the product modulo B^nm+1. On our side we use multiplication modulo B^nm
> (to be honest we only truncate linear operations) or wraparound modulo
> B^nm-1. I believe the results are the same obtained by the steps you
> suggest. And I probably should change the CNST_LIMB(7) used in the last
> check with some stricter value.
> 
> We should try if tune/speed is able to detect some speed difference
> between the two implementations, probably not...
> 
> Best regards,
> Marco
> 
> -- 
> http://bodrato.it/