Some arm cortex-a8 improvements
Torbjorn Granlund
tg at gmplib.org
Tue Apr 24 00:32:35 CEST 2012
Richard Henderson <rth at twiddle.net> writes:
On 04/23/12 07:49, Torbjorn Granlund wrote:
> Do you know the repeat rate of umull, umlal, umaal, assuming no reg
> dependencies?
For a8: 3 cycles.
For a9 it seems to be 2 cycles, so 3.25 c/l for the current addmul_1 is
not very good.
I have found no timing docs, so I measured it myself:
.text
.global main
main: push {r4-r8}
mov r12, #0x3b800000
1: subs r12, r12, #1
umaal r0, r1, r14, r14
umaal r2, r3, r14, r14
umaal r4, r5, r14, r14
umaal r6, r7, r14, r14
bne 1b
pop {r4-r8}
bx lr
This loop takes about 9 cycles, or 2.25 cycles per umaal.
The latency is 3 cycles (found by using r0,r1 for every umaal above).
--
Torbjörn
More information about the gmp-devel
mailing list