nisse at lysator.liu.se
Wed Feb 20 22:55:54 CET 2013
Torbjorn Granlund <tg at gmplib.org> writes:
> You mean 10 cycles per for one U limb multiplied by the 2 V limbs?
> Then 7/2 = 3.5 c/l is a good start.
Unfortunately not. speed -C -s ... mpn_addmul_2 reported around 14
cycles, so it's 7 c/l, compared to 2.38 for the current non-simd code.
If I interpret speed output correctly.
> What about SIMD multiply-accumulate? IIRC, these insns have the same
> latency ate throughput as non-accumulating SIMD multiplies.
Should look into that (I didn't notice any useful integer
multiply-accumulate instructions on my first reading of the manual). But
I suspect you get them on the critical path, and then the relevant
comparison is to add latency, not mul latency.
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel