ARM public key benchmark
Niels Möller
nisse at lysator.liu.se
Tue Apr 2 15:14:18 CEST 2013
nisse at lysator.liu.se (Niels Möller) writes:
> I'm not yet using GMP's mpn_cnd_{add,sub}_n, that's the next thing I'd
> like to try.
That wasn't a clear win... I use addmul_1 and submul_1 as a fallback
(and I always do in-place operation, so that works). Now, cnd_sub_n
beats submul_1 (except for n == 2, which I don't use):
$ GMP_CPU_FREQUENCY=1e9 ./speed -C -s 1-10,100 mpn_submul_1.1 mpn_cnd_sub_n
clock_gettime is 1.000ns accurate
overhead 8.87 cycles, precision 1000 units of 1.00e-06 secs, CPU freq 1000.00 MHz
mpn_submul_1.1 mpn_cnd_sub_n
1 #19.8927 21.6831
2 #10.9752 12.4106
3 9.5514 #8.9371
4 8.5227 #6.6696
5 7.8316 #6.7412
6 7.1571 #6.0339
7 7.2859 #5.3320
8 6.8553 #4.8715
9 6.6945 #5.0376
10 6.3129 #4.8351
100 5.5065 #3.2110
But for addition, mpn_addmul_1 beats mpn_cnd_add_n for many small sizes,
$ GMP_CPU_FREQUENCY=1e9 ./speed -C -s 1-10,100 mpn_addmul_1.1 mpn_cnd_add_n
clock_gettime is 1.000ns accurate
overhead 8.94 cycles, precision 1000 units of 1.00e-06 secs, CPU freq 1000.00 MHz
mpn_addmul_1.1 mpn_cnd_add_n
1 #19.8927 21.2256
2 #10.8574 11.6940
3 #8.0235 8.5240
4 #6.4561 6.5216
5 #6.0308 6.5071
6 #5.4937 5.9282
7 #5.2063 5.3603
8 4.8838 #4.7493
9 #4.9249 4.9533
10 #4.5364 4.8244
100 3.4846 #3.2842
Some questions:
1. I guess one can expect submul_1 to always be a bit slower than
addmul_1, since submul_1 needs additional arithmetics besides the
umaal? One could perhaps do some negations on the fly, a - b C = -
((-a) + b*C), maybe that would be advantageous?
2. cnd_add_n should be at least as fast as addmul_1, shouldn't it? It
appears to be 0.25 c/l faster for larger operands, so maybe it's "only"
a question of optimizing loop setup and feedin?
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list