tg at gmplib.org
Fri Feb 22 19:20:16 CET 2013
Richard Henderson <rth at twiddle.net> writes:
The widening add insns are:
VADDL 32+32->64 Qd[n] = Dn[n] + Dm[n]
VADDW 64+32->64 Qd[n] = Qn[n] + Dm[n]
VPADDL 32+32->64 Qd[n/2] = Dn[2n] + Dn[2n+1] ("horizontal add")
VPADAL 32+32+64->64 Qd[n/2] += Dn[2n] + Dn[2n+1]
All very useful.
There is a narrowing add insn which might still be interesting:
VADDHN 64+64->32 Dd[n] = (Qn[n] + Qm[n]) >> 32
Useful. Is there any 32+32 >> 32 -> 32? I.e., carry-out.
Suppose you're looking to do a final sum in a vector and will
subsequently be shifting the data for addition into the next column.
You have two choices:
vadd.i64 Qc0, Qa0, Qb0
vsra.i64 Qc1, Qc0, #32
vaddhn.i64 Dtmp, Qa0, Qb0
vaddw.u32 Qd1, Qc1, Dtmp
Such a re-ordering might be able to make data available for input
earlier. Or may be able to store data away in a single D register
rather than keeping around the double Q, easing register pressure.
These instructions really open many possibilities.
I won't hack any Neon assembly in this period, and will instead hope for
a 0.7 c/l mul_basecase to appear from some other GNU hacker. :-)
More information about the gmp-devel