Some arm cortex-a8 improvements

Torbjorn Granlund tg at
Tue Apr 24 00:32:35 CEST 2012

Richard Henderson <rth at> writes:

  On 04/23/12 07:49, Torbjorn Granlund wrote:
  > Do you know the repeat rate of umull, umlal, umaal, assuming no reg
  > dependencies?
  For a8: 3 cycles.
For a9 it seems to be 2 cycles, so 3.25 c/l for the current addmul_1 is
not very good.

I have found no timing docs, so I measured it myself:

	.global	main
main:	push	{r4-r8}
	mov	r12, #0x3b800000

1:	subs		r12, r12, #1
	umaal	r0, r1, r14, r14
	umaal	r2, r3, r14, r14
	umaal	r4, r5, r14, r14
	umaal	r6, r7, r14, r14
	bne		1b

	pop	{r4-r8}
	bx		lr

This loop takes about 9 cycles, or 2.25 cycles per umaal.

The latency is 3 cycles (found by using r0,r1 for every umaal above).


More information about the gmp-devel mailing list