arm "neon"

Torbjorn Granlund tg at gmplib.org
Mon Jan 14 12:43:06 CET 2013


nisse at lysator.liu.se (Niels Möller) writes:

  In Chapter 3, multiplication instructions listed in a table starting on
  page "3-14". But now I see I read the entry for a smaller data size. For
  32-bit inputs, it's apparently 2 cycles, not 1.
  
It seems to be 2 cycles indeed:

	.text
	.globl	main
	.type	main, #function
main:
	mov	r0, #1006632960
1:	subs	r0, r0, #1
	vmull.u32	q2, d0, d0
	vmull.u32	q4, d0, d0
	vmull.u32	q6, d0, d0
	vmull.u32	q8, d0, d0
	bne	1b
	mov	pc, lr

But IIUC, we are thus performing a 32 x 32 -> 64 mul per cycle.
Can one stick addition here without consuming cycles?

-- 
Torbjörn


More information about the gmp-devel mailing list