arm "neon"
Torbjorn Granlund
tg at gmplib.org
Mon Jan 14 14:24:34 CET 2013
The corresponding code sustains one vmull.u32 per cycle on A15. That's
4 times the bandwidth of its umul implementation.
It is usually tricky to make use of SIMD operations for addmul_(k) and
friends. The well-designed ARM instructions will surely make it easier,
but it might still require many instructions for shuffling intermediates
around.
(Did you notice that VMUL allows multiplication for GF(2^n)? That
should come in handy for Nettle.)
--
Torbjörn
More information about the gmp-devel
mailing list