arm "neon"

Niels Möller nisse at
Sat Jan 12 16:22:38 CET 2013

I spent most of Friday reading the arm instruction reference (primarily
motivated by a different project). It seems current GMP loops are based
on umaal, which appears to be tailor-made for addmul_1.

But in the instruction list, I also noticed VMULL, which can do two
32x32->64 products in parallel (to bad it doesn' support 64-bit inputs,
as far as I see). Has anyone played with that? And in general, where can
I find info on the timing of arm instructions (for, say, the most common
A9 and A15 implementations)?


Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list