tg at gmplib.org
Sat Jan 12 20:52:43 CET 2013
nisse at lysator.liu.se (Niels Möller) writes:
Hmm, if I understand you correctly, it is preferable if the cpu can
start doing the multiplication without any dependency on the carry from
previous iteration, right? At least in theory, umaal could be
implemented in such a way.
IIRC, the read of the accumulating registers is a cycle later than the
read of the multiplicand registers on A9.
1. What are the calling conventions?
It probably easiest to compile some trivial examples to assembly. I
look at gcc/config/arm/arm.h and its CALL_USED_REGISTERS to figure out
the register partitioning... (That macros exist for all machines.)
2. What gcc flags should I use to be able to get uint64_t variables into
No idea. I doubt there is one.
I think armhf ("hard float") is a separate ABI, which passes float
parameters in fp regs.
I've been looking primarily for operations useful for crypto. Like wide
xor, shift/rotate, other data shuffling. Or just using the additional
registers to store uint64_t variables would give a decent speedup over
using the regular registers, I imagine.
Most (non-mul) stuff is avalable at data size of up 64. The registers
are 128 bits wide, IIRC.
The load ad store insns are cool, allowing various strides and (for
> Using Neon in a robust way might be a bit tricky, though. I have no
> idea how to determine if a CPU has Neon or not, and ARM has made most
> useful meta instructions supervisor-only.
For a start, I guess it could be a configure time option (with no
fat-binary things). Either explicit, or automatically based on, e.g.,
linux' /proc/cpuinfo which lists available cpu extensions.
I feel that grepping in a /prec file is perhaps OK for configure, but
still not great.
More information about the gmp-devel