ARM Neon popcount
Torbjorn Granlund
tg at gmplib.org
Wed Feb 27 22:27:54 CET 2013
I decided to play a bit with Neon, but instead of doing something hard
like addmul_k, I wrote an mpn_popcount. :-)
The code runs well for A15 at about 0.56 c/l, but much worse on A9 at
about 2.8 c/l. (The inner-loops hard whacking on q8 is a problem on A9;
using a8 and a9 alternatingly shaves off about 0.4 c/l. Still
unimpressive.)
I am a novice at Neon hacking, so I am sure this can be improved in
various ways.
Specific questions:
* I completely ignore alignment. Is that bad?
* Can 32 bits be read to a dN register with zeroing of the other 32
bits? (See comment "surely we can read...".)
* Could one shave of an instruction in the final accumulation? We don't
really need 64-bit accumulators.
* Can one read four 128-bit values using just one insn (for inner loop)?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arm-popcount.asm
Type: application/octet-stream
Size: 2670 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20130227/259ab9c7/attachment.obj>
-------------- next part --------------
--
Torbj?rn
More information about the gmp-devel
mailing list