tg at gmplib.org
Sun Sep 15 13:38:05 UTC 2019
nisse at lysator.liu.se (Niels Möller) writes:
tg at gmplib.org (Torbjörn Granlund) writes:
> Looking at when method 1 or 3 is faster than 2 is more interesting.
> Method 1 and to some extent also method 3 would benefit from asm code,
How would method 1 benefit from asm code? To use instructions providing
both quotient and remainder? For x86_64, it looks like gcc uses
xor %edx, %edx
to get quotient and remainder (i.e., using the 128/64 division
instruction; there seems there is no 64/64 divison instruction?).
I meant to write method 2 and 3.
Method 2 and 3 begs for using subtract + cmov, not sure if the compiler
is clever enough to do that for the code using masks.
The compiler is not that clever, it seems.
> Would your present tuneup/speed setup allow measuring of asm code?
I think so, ./speed mpn_hgcd2 will measure the mpn_hgcd2 function in the
library, with no extra hacks. Will maybe need some tricks to be able to
select other methods and ignore the asm version, but I'd expect that to be
Well, one will at least need to suppress the local versions by editing
> The current div1 measurements include hgcd2's own time, right? I.e., if
> we found a div1 which runs in zero cycles, the timings would not be
Yes. Do you think it makes sense to measure div1 in isolation? Since
performance depends so much on the distribution of the inputs, measuring
hgcd2 makes sense to me.
I think I agree.
The distribution of operands are the same as for a plain, old Euclid,
isn't it? So one could perhaps provide a set of uniformly distribted
random numbers for div1 and get adequate results?
I'm appending another iteration of the patch to add div2 function based
on div1 on the high limbs. Selected via HGCD2_DIV2_METHOD. Benchmarks:
HGCD2_DIV2_METHOD mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3
1 #1504.47 1740.53 1513.56 div1 + unlikely case
2 #1739.66 1858.01 1753.73 current code
3 #1615.32 1736.69 1621.95 the if:ed out version
Looks good. Ready for repo?
What is the current reasoning on suppressing the q = 1 code in the hgcd2
code calling div1 and div2? That's largely orthogomal to the current
HGCD2_DIV*_METHOD choices, I suppose, although surely q = 1 special
handling makes more sense for HGCD2_DIV1_METHOD = 1 that for the other
A combinatoric explosion would be nice to avoid.
Please encrypt, key id 0xC8601622
More information about the gmp-devel