arm "neon"

Niels Möller nisse at
Wed Feb 20 22:55:54 CET 2013

Torbjorn Granlund <tg at> writes:

> You mean 10 cycles per for one U limb multiplied by the 2 V limbs?
> Then 7/2 = 3.5 c/l is a good start.

Unfortunately not. speed -C -s ... mpn_addmul_2 reported around 14
cycles, so it's 7 c/l, compared to 2.38 for the current non-simd code.
If I interpret speed output correctly.

> What about SIMD multiply-accumulate?  IIRC, these insns have the same
> latency ate throughput as non-accumulating SIMD multiplies.

Should look into that (I didn't notice any useful integer
multiply-accumulate instructions on my first reading of the manual). But
I suspect you get them on the critical path, and then the relevant
comparison is to add latency, not mul latency.


Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list