fast inversion
bodrato at mail.dm.unipi.it
bodrato at mail.dm.unipi.it
Mon May 18 06:36:05 UTC 2015
Ciao,
I pushed Niels' code for mpn_neg. The old timings was:
> @shell ~/gmp-repo$ tune/speed -s 1-1030 -f 2 -c mpn_neg mpn_com
> mpn_add_1_inplace.1
> overhead 6.78 cycles, precision 10000 units of 2.86e-10 secs, CPU freq
3500.08 MHz
> mpn_neg mpn_com mpn_add_1_inplace.1
> 1 #5.68 12.54 6.80
> 2 9.40 13.65 #8.19
> 4 16.25 11.40 #8.22
> 8 31.56 16.01 #6.84
> 16 61.86 25.10 #8.16
> 32 139.01 44.79 #6.80
> 64 248.18 85.51 #8.20
> 128 472.77 206.21 #8.38
> 256 918.75 372.29 #8.21
> 512 1915.83 731.53 #6.87
> 1024 3689.67 1472.14 #8.29
Now we have:
@shell ~/gmp-repo$ tune/speed -s 1-1030 -f 2 -c mpn_neg mpn_com
overhead 6.77 cycles, precision 10000 units of 2.86e-10 secs, CPU freq
3500.08 MHz
mpn_neg mpn_com
1 #3.41 12.53
2 20.38 #13.64
4 20.39 #11.38
8 23.83 #16.02
16 30.63 #25.00
32 48.08 #44.81
64 88.81 #85.34
128 #170.27 208.85
256 382.19 #374.52
512 747.29 #735.20
1024 1480.86 #1472.57
The new code is faster for n==1, slower for 2 <= n <= 4, and faster (more
than twice) for n >= 16.
> After a first glance to the code, two lines surprise me:
> mpn_com_n (tp, tp, n);
> mpn_add_1 (tp, tp, n, ONE);
> I wondered why you didn't use
> mpn_neg_n (tp, tp, n);
Anyway, in your code you should probably write:
mpn_com_n (tp + l, tp + l, h);
mpn_add_1 (tp + l, tp + l, h, mpn_zero_p (tp, l));
Regards,
m
--
http://bodrato.it/papers/
More information about the gmp-devel
mailing list