ARM public key benchmark

Wed Apr 3 12:26:50 CEST 2013

nisse at lysator.liu.se (Niels Möller) writes:

> 2. cnd_add_n should be at least as fast as addmul_1, shouldn't it? It
>    appears to be 0.25 c/l faster for larger operands, so maybe it's "only"
>    a question of optimizing loop setup and feedin?

For large operands, it's strictly between add_n and addmul_1, which I
guess is as expected. For small sizes, I had a look at the loop setup
for add_n, which checks bit 0 and 1 of n separately. If that's faster,
maybe one could borrow that logic.

I also wonder if there are any other tricks to speed up cnd_add_n. As
far as I understand, shift operations on arm don't truncate shift counts
to 5 bits (0-31), so one could perhaps replace

   bic	b, b, cnd		C zero for true, all ones for false
   adcs	r, a, b

with

   adcs	r, a, b, lsl cnd	C zero for true, 32 for false

(If we believe that timing and internal dependencies are independent of
the shift count). I played a little with that, but I get no speed
improvement so far.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.