ARM public key benchmark

Torbjorn Granlund tg at
Wed Apr 3 15:11:39 CEST 2013

nisse at (Niels Möller) writes:

  nisse at (Niels Möller) writes:
  > So it should be doable with the addmul_1 loop and two additional,
  > non-recurrency, not instructions per limb, and then maybe some extra
  > logic for the return value. One could aim for 4.25 c/l, I guess.
  The below seems to give correct results. But still 5.25 c/l. Maybe
  scheduling can be improved, I just put the new mvn instructions
  immediately preceding umaal and str.
The A9 is not a true OoO design, it wants manual scheduling.

I also suspect the autoincrement of ldr should be replaced by a discrete
pointer update.


More information about the gmp-devel mailing list