gcd_22
Torbjörn Granlund
tg at gmplib.org
Fri Aug 23 09:13:06 UTC 2019
nisse at lysator.liu.se (Niels Möller) writes:
The below implementation appears to pass tests, and give a modest
speedup of 0.2 cycles per input bit, or 2.5%. Benchmark, comparing C
implementations of gcd_11 and gcd_22:
Beware of "turbo" when counting cycles! (Relative measurements like
gcd_11 vs gcd_22 should be fine!)
The speed difference between C gcd_11 and gcd_22 is surprisingly small!
Perhaps gcd_11 should be rewritten in the style of gcd_22?
I did not provide a top-level gcd_22 for x86_64 as you might have seen.
The one similar to x86_64/gcd_11.asm is probably x86_64/k8/gcd_22.asm.
Perhaps it should be moved.
But as far as I can tell, that function is slower than you C gcd_22 for
some platforms, such as Intel haswell.
I'm curious if your C code could be made into competitive asm. One
usually can beat the compiler some 10-30%.
Measurements for gcd_11/22 for most of our machines are in. See
https://gmplib.org/devel/tm/gmp/date.html and click on any HOSTgentoo64
tuneup link. Scroll down; after the normal *_THRESHOLD stuff comes
comparative measurements of asm code. (The mpn/generic code is not
usually measured; the exception is when it appears in the default
column. I plan to fix this some day, and have a few columns "gcc -O",
"gcc -Os", "gcc -O2".)
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list