fast inversion

Mon May 18 06:36:05 UTC 2015

Ciao,

I pushed Niels' code for mpn_neg. The old timings was:

> @shell ~/gmp-repo$ tune/speed -s 1-1030 -f 2 -c mpn_neg mpn_com
> mpn_add_1_inplace.1
> overhead 6.78 cycles, precision 10000 units of 2.86e-10 secs, CPU freq
3500.08 MHz
>               mpn_neg       mpn_com mpn_add_1_inplace.1
> 1               #5.68         12.54          6.80
> 2                9.40         13.65         #8.19
> 4               16.25         11.40         #8.22
> 8               31.56         16.01         #6.84
> 16              61.86         25.10         #8.16
> 32             139.01         44.79         #6.80
> 64             248.18         85.51         #8.20
> 128            472.77        206.21         #8.38
> 256            918.75        372.29         #8.21
> 512           1915.83        731.53         #6.87
> 1024          3689.67       1472.14         #8.29

Now we have:

@shell ~/gmp-repo$ tune/speed -s 1-1030 -f 2 -c mpn_neg mpn_com
overhead 6.77 cycles, precision 10000 units of 2.86e-10 secs, CPU freq
3500.08 MHz
              mpn_neg       mpn_com
1               #3.41         12.53
2               20.38        #13.64
4               20.39        #11.38
8               23.83        #16.02
16              30.63        #25.00
32              48.08        #44.81
64              88.81        #85.34
128           #170.27        208.85
256            382.19       #374.52
512            747.29       #735.20
1024          1480.86      #1472.57

The new code is faster for n==1, slower for 2 <= n <= 4, and faster (more
than twice) for n >= 16.

> After a first glance to the code, two lines surprise me:
>       mpn_com_n (tp, tp, n);
>       mpn_add_1 (tp, tp, n, ONE);
> I wondered why you didn't use
>       mpn_neg_n (tp, tp, n);

Anyway, in your code you should probably write:
   mpn_com_n (tp + l, tp + l, h);
   mpn_add_1 (tp + l, tp + l, h, mpn_zero_p (tp, l));

Regards,
m

-- 
http://bodrato.it/papers/