ARM public key benchmark
nisse at lysator.liu.se
Wed Apr 3 09:29:19 CEST 2013
Torbjorn Granlund <tg at gmplib.org> writes:
> nisse at lysator.liu.se (Niels Möller) writes:
> But for addition, mpn_addmul_1 beats mpn_cnd_add_n for many small sizes,
> 6 #5.4937 5.9282
> Not an alarming difference.
Maybe not, but I got a measurable slowdown of some ECC operations when
switching to mpn_cnd_add_n, and my best guess is that this is the reason
> 1. I guess one can expect submul_1 to always be a bit slower than
> addmul_1, since submul_1 needs additional arithmetics besides the
> umaal? One could perhaps do some negations on the fly, a - b C = -
> ((-a) + b*C), maybe that would be advantageous?
> I encourage you to work on that; 3.25 c/l vs 5.25 c/l seem like a very
> large difference between addmul_1 and submul_1.
After some further thinking, it should work fine with one's complement
rather than two's complement for the negations,
a - b*C = ~(b*C + ~a) (if we do the complements on n+1 limbs)
So it should be doable with the addmul_1 loop and two additional,
non-recurrency, not instructions per limb, and then maybe some extra
logic for the return value. One could aim for 4.25 c/l, I guess.
> I've never considered addmul_1/submul_1 as alternatives to
But they are, except that addmul_1/submul_1 always work in-place. Should
be side-channel silent on the same machines where, e.g, mul_1 is
side-channel silent, right?
> A similar situation is that addmul_1/submul_1 is sometimes faster than
And in that case, it would be nice with some configure magic to disable
the lsh_1 functions and use addmul_1/submul_1 instead.
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel