nisse at lysator.liu.se
Sat Jan 12 19:30:16 CET 2013
Torbjorn Granlund <tg at gmplib.org> writes:
> It is OK for addmul_1, but our usage suffers from that they are on a
> tight critical path.
Hmm, if I understand you correctly, it is preferable if the cpu can
start doing the multiplication without any dependency on the carry from
previous iteration, right? At least in theory, umaal could be
implemented in such a way.
Thanks! I have a couple of additional newbie questions:
1. What are the calling conventions?
2. What gcc flags should I use to be able to get uint64_t variables into
(I'll look into that when I'm back at work on Monday, and I hope the
answers should be easy to find, so it's not urgent).
> I haven't played with Neon much. There are lots of instructions there
> which might be useful for us. At least lshift, lshiftc, rshift,
> popcount, hamdist, copyi, copyd, and com could be improved.
One difference to x86 simd (beyond style) is that there seems to be
several widening instructions, with 32-bit inputs and 64-bit outputs,
both related to multiplication and addition.
I've been looking primarily for operations useful for crypto. Like wide
xor, shift/rotate, other data shuffling. Or just using the additional
registers to store uint64_t variables would give a decent speedup over
using the regular registers, I imagine.
> Using Neon in a robust way might be a bit tricky, though. I have no
> idea how to determine if a CPU has Neon or not, and ARM has made most
> useful meta instructions supervisor-only.
For a start, I guess it could be a configure time option (with no
fat-binary things). Either explicit, or automatically based on, e.g.,
linux' /proc/cpuinfo which lists available cpu extensions.
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel