GCD project status?
Torbjörn Granlund
tg at gmplib.org
Mon Sep 23 22:25:38 UTC 2019
nisse at lysator.liu.se (Niels Möller) writes:
$ ./tune/speed -c -s1 -p100000 mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3 mpn_hgcd2_4 mpn_hgcd2_5 mpn_hgcd2_binary
overhead 6.02 cycles, precision 100000 units of 8.33e-10 secs, CPU freq 1200.00 MHz
mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3 mpn_hgcd2_4 mpn_hgcd2_5 mpn_hgcd2_binary
1 #1668.90 1863.72 1670.73 1757.54 1738.50 2044.25
Had a look at the disassembly for the binary algorithm. The
double-precision loop needs, 20 instructions for just the conditional
swap logic, 23 for the clz + shift + subtract, 8 for the shift+add
updates of the u matrix.
Perhaps keeping the to-be-swapped variables in two structs, and instead
conditionally swap pointers to the structs?
Some measurements with method 4 and 5 are now in. Modern Intel CPUs
like method 5, as I had expected.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list