GCD project status?

Torbjörn Granlund tg at gmplib.org
Tue Sep 24 08:30:53 UTC 2019


nisse at lysator.liu.se (Niels Möller) writes:

  Hmm. Definitely worth a try. But if we need explicit loads and stores
  from the structs, we'll not save that many instructions.

x86 can handle operate-from-memory at almost the same cost as
operate-from-register.  But operate-to-memory is expensive.

Load/store architectures typically reach their peak execution throughput
only with a mixture of loads and operates.

  Each iteration needs to load all the values but store only half of
  them, so for each pair of values load + load + store, compared to mov,
  xor, and, xor, xor for conditionally swapping using a mask.

You might want to explicitly load some of the values into registers.
(And perhaps use the "restrict" keyword for the swappable pointers...but
I am afraid that we'd be stepping outside the vague semantics of
restrict.)

  > Some measurements with method 4 and 5 are now in.  Modern Intel CPUs
  > like method 5, as I had expected.

  Nice! With a few % margin over method 3.

8 configs now vouch for method 5.

And method 4 got its first "honourable mention"; beagle thinks it is 2nd
best.  :-)

I don't think method 4 will see much use unless we find a way to
radically improve the applicability of its large tables.  I made table
size 2048 the default, just for testing purposes.  The next smaller
table size is 512 bytes, which is a more reasonable size.  It gets a 87%
hit rate.  I'd say we need that to get beyond 95% for method 4 to become
viable.

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list