GCD project status?
Torbjörn Granlund
tg at gmplib.org
Tue Sep 24 08:30:53 UTC 2019
nisse at lysator.liu.se (Niels Möller) writes:
Hmm. Definitely worth a try. But if we need explicit loads and stores
from the structs, we'll not save that many instructions.
x86 can handle operate-from-memory at almost the same cost as
operate-from-register. But operate-to-memory is expensive.
Load/store architectures typically reach their peak execution throughput
only with a mixture of loads and operates.
Each iteration needs to load all the values but store only half of
them, so for each pair of values load + load + store, compared to mov,
xor, and, xor, xor for conditionally swapping using a mask.
You might want to explicitly load some of the values into registers.
(And perhaps use the "restrict" keyword for the swappable pointers...but
I am afraid that we'd be stepping outside the vague semantics of
restrict.)
> Some measurements with method 4 and 5 are now in. Modern Intel CPUs
> like method 5, as I had expected.
Nice! With a few % margin over method 3.
8 configs now vouch for method 5.
And method 4 got its first "honourable mention"; beagle thinks it is 2nd
best. :-)
I don't think method 4 will see much use unless we find a way to
radically improve the applicability of its large tables. I made table
size 2048 the default, just for testing purposes. The next smaller
table size is 512 bytes, which is a more reasonable size. It gets a 87%
hit rate. I'd say we need that to get beyond 95% for method 4 to become
viable.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list