Please update addaddmul_1msb0.asm to support ABI in mingw64
Torbjörn Granlund
tg at gmplib.org
Thu Oct 7 08:00:07 UTC 2021
nisse at lysator.liu.se (Niels Möller) writes:
Gave it a run on my closest x86_64 (intel broadwell, no mulx)), and
numbers for mpn_addaddmul_1msb0 are not impressing. Also, it appears
mpn_addmul_2 is significantly slower than two addmul_1.
I believe addmul_2 is inhibited for that CPU. It might still appear in
the compiled library, though. :-(
79 #1.5617 1.8006 4.3277 4.6949
86 #1.5702 1.7883 4.3290 4.7031
94 #1.5441 1.7743 4.3321 4.7018
So there's definitely some room for improvement.
The odd instruction order of the present loop suggests it was optimised
for K8. In fact, it runs almost optimally there.
(32 loop instructions, the 6 muls need a double slot, so 38. 3-way
issue, 6 way unrolled gives (32+6)/3/6 = 2.111... Very close to the
stated 2.167.)
Beating mul_1 + addmul_1 elsewhere without loopmixing will probably be
hard. We should probably move the present code into the k8 subdir.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list