tg at gmplib.org
Sat Jan 12 16:59:38 CET 2013
nisse at lysator.liu.se (Niels Möller) writes:
I spent most of Friday reading the arm instruction reference (primarily
motivated by a different project). It seems current GMP loops are based
on umaal, which appears to be tailor-made for addmul_1.
It is OK for addmul_1, but our usage suffers from that they are on a
tight critical path. For addmul_2 this is not a problem. I suspect
addmul_1 should not really use umaal, at least not for A15.
But in the instruction list, I also noticed VMULL, which can do two
32x32->64 products in parallel (to bad it doesn' support 64-bit inputs,
as far as I see). Has anyone played with that? And in general, where can
I find info on the timing of arm instructions (for, say, the most common
A9 and A15 implementations)?
I found the A9 manual here:
The corresponding A15 manual seems less forthcoming wrt cycle numbers.
Login to parma, explore!
I haven't played with Neon much. There are lots of instructions there
which might be useful for us. At least lshift, lshiftc, rshift,
popcount, hamdist, copyi, copyd, and com could be improved.
While x86's SIMD seems to have as little organisation as a garbage dump,
Neon is carefully designed. It is a nice change. Neon is surprisingly
powerful. They generalised instructions in a nice way.
Using Neon in a robust way might be a bit tricky, though. I have no
idea how to determine if a CPU has Neon or not, and ARM has made most
useful meta instructions supervisor-only.
More information about the gmp-devel