ARM public key benchmark

Niels Möller nisse at
Thu Apr 4 15:59:04 CEST 2013

Torbjorn Granlund <tg at> writes:

> The newer sparc adds 64-bit carrying adds, but they still don't have
> corresponding subtraction instructions.  Se David sets carry before
> entering the loop, and ones complements the subtrahend.

Ah. I think I even suggested that trick, for mpn_sub_n. Not obviously
applicable to submul_1, though, which made me think you were referring
to something different.

> I assume that ldm loads the registers in some secific order, such as
> lowest numbered first.

I guess it's lowest numbered first (and lowest memory address).

But a loop with

  use	r7  
  ldm	up!, {r4,r5,r6,r7}
  use	r4

looks like poor scheduling betwen load of r4 and use of it, and the ldm
can't be moved earlier since it clobbers r7. But I have a pretty vague
idea about how this really works.

> Then, it could lift the screboard bit for
> availale register values while ldm executes.

I'm not sure if I understand, but if ldm is executed in the background,
with r4 loaded first, there's still tight between load of r4 and use.
>From what I've learnt so far about A9 scheduling, I'd want to place the
load of r4 before the last use of r7, and then I can't use ldm.

> Using ldm with just two register might be pointless.  Also, it will for
> 50% of alignments take 2 cycles.  Doing three registers is (as we've
> discussed in the past) more applealing.

Right, rewriting the loop with 3-way unrolling would be an interesting
experiment. But I don't think I'll look into that soon.


Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list