arm "neon"

Torbjorn Granlund tg at
Sat Jan 12 20:52:43 CET 2013

nisse at (Niels Möller) writes:

  Hmm, if I understand you correctly, it is preferable if the cpu can
  start doing the multiplication without any dependency on the carry from
  previous iteration, right? At least in theory, umaal could be
  implemented in such a way.
IIRC, the read of the accumulating registers is a cycle later than the
read of the multiplicand registers on A9.

  1. What are the calling conventions?
It probably easiest to compile some trivial examples to assembly.  I
look at gcc/config/arm/arm.h and its CALL_USED_REGISTERS to figure out
the register partitioning...  (That macros exist for all machines.)

  2. What gcc flags should I use to be able to get uint64_t variables into
     neon registers?
No idea.  I doubt there is one.

I think armhf ("hard float") is a separate ABI, which passes float
parameters in fp regs.

  I've been looking primarily for operations useful for crypto. Like wide
  xor, shift/rotate, other data shuffling. Or just using the additional
  registers to store uint64_t variables would give a decent speedup over
  using the regular registers, I imagine.
Most (non-mul) stuff is avalable at data size of up 64.  The registers
are 128 bits wide, IIRC.

The load ad store insns are cool, allowing various strides and (for
load) padding.

  > Using Neon in a robust way might be a bit tricky, though.  I have no
  > idea how to determine if a CPU has Neon or not, and ARM has made most
  > useful meta instructions supervisor-only.
  For a start, I guess it could be a configure time option (with no
  fat-binary things). Either explicit, or automatically based on, e.g.,
  linux' /proc/cpuinfo which lists available cpu extensions.
I feel that grepping in a /prec file is perhaps OK for configure, but
still not great.


More information about the gmp-devel mailing list