arm "neon"
Torbjorn Granlund
tg at gmplib.org
Mon Jan 14 12:43:06 CET 2013
nisse at lysator.liu.se (Niels Möller) writes:
In Chapter 3, multiplication instructions listed in a table starting on
page "3-14". But now I see I read the entry for a smaller data size. For
32-bit inputs, it's apparently 2 cycles, not 1.
It seems to be 2 cycles indeed:
.text
.globl main
.type main, #function
main:
mov r0, #1006632960
1: subs r0, r0, #1
vmull.u32 q2, d0, d0
vmull.u32 q4, d0, d0
vmull.u32 q6, d0, d0
vmull.u32 q8, d0, d0
bne 1b
mov pc, lr
But IIUC, we are thus performing a 32 x 32 -> 64 mul per cycle.
Can one stick addition here without consuming cycles?
--
Torbjörn
More information about the gmp-devel
mailing list