ARM public key benchmark
Niels Möller
nisse at lysator.liu.se
Thu Apr 4 15:59:04 CEST 2013
Torbjorn Granlund <tg at gmplib.org> writes:
> The newer sparc adds 64-bit carrying adds, but they still don't have
> corresponding subtraction instructions. Se David sets carry before
> entering the loop, and ones complements the subtrahend.
Ah. I think I even suggested that trick, for mpn_sub_n. Not obviously
applicable to submul_1, though, which made me think you were referring
to something different.
> I assume that ldm loads the registers in some secific order, such as
> lowest numbered first.
I guess it's lowest numbered first (and lowest memory address).
But a loop with
use r7
ldm up!, {r4,r5,r6,r7}
use r4
looks like poor scheduling betwen load of r4 and use of it, and the ldm
can't be moved earlier since it clobbers r7. But I have a pretty vague
idea about how this really works.
> Then, it could lift the screboard bit for
> availale register values while ldm executes.
I'm not sure if I understand, but if ldm is executed in the background,
with r4 loaded first, there's still tight between load of r4 and use.
>From what I've learnt so far about A9 scheduling, I'd want to place the
load of r4 before the last use of r7, and then I can't use ldm.
> Using ldm with just two register might be pointless. Also, it will for
> 50% of alignments take 2 cycles. Doing three registers is (as we've
> discussed in the past) more applealing.
Right, rewriting the loop with 3-way unrolling would be an interesting
experiment. But I don't think I'll look into that soon.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list