arm "neon"

Fri Feb 22 19:20:16 CET 2013

Richard Henderson <rth at twiddle.net> writes:

  The widening add insns are:
    VADDL  32+32->64     Qd[n] = Dn[n] + Dm[n]
    VADDW  64+32->64     Qd[n] = Qn[n] + Dm[n]
    VPADDL 32+32->64     Qd[n/2] = Dn[2n] + Dn[2n+1]    ("horizontal add")
    VPADAL 32+32+64->64  Qd[n/2] += Dn[2n] + Dn[2n+1]

All very useful.

  There is a narrowing add insn which might still be interesting:

    VADDHN 64+64->32     Dd[n] = (Qn[n] + Qm[n]) >> 32

Useful.  Is there any 32+32 >> 32 -> 32?  I.e., carry-out.

  Suppose you're looking to do a final sum in a vector and will
  subsequently be shifting the data for addition into the next column.
  You have two choices:

  	vadd.i64	Qc0, Qa0, Qb0
  	vsra.i64	Qc1, Qc0, #32
  or
  	vaddhn.i64	Dtmp, Qa0, Qb0
  	vaddw.u32	Qd1, Qc1, Dtmp

  Such a re-ordering might be able to make data available for input
  earlier.  Or may be able to store data away in a single D register
  rather than keeping around the double Q, easing register pressure.

These instructions really open many possibilities.

I won't hack any Neon assembly in this period, and will instead hope for
a 0.7 c/l mul_basecase to appear from some other GNU hacker.  :-)

Torbjörn