hgcd1/2
Torbjörn Granlund
tg at gmplib.org
Sat Sep 14 09:36:28 UTC 2019
nisse at lysator.liu.se (Niels Möller) writes:
> I went ahead and committed that version, replacing the old
> HGCD2_METHOD=2. I expect it is be the fastest method on some platform.
Will be interesting to see results on thresholds.
Nobody loves the new METHOD 2. :-(
(Not many machines reported results, but some machines which I expected
could have wanted METHOD 2 did.)
We might as well switch default to METHOD 3 or 1.
> (We might want to arrange for longlong.h to use lzcnt instead of bsr for
> modern AMD processors; the initial two count_leading_zeros would
> terminate in one cycle instead of 8 thereby!)
Looks like you did that too.
Yes, and caused some widespread breakage with the mulx change I also
committed. (A fix is ready.)
But at least one failure, with ivyfbsd64v12, is related to tuneup.c.
I've now tried the similar #if:ed out div2 code, and enabling it gives
an 8% speedup on my laptop.
Nice!
Next, I think we should go ahead with the rename HGCD2_METHOD to
DIV11_METHOD or possibly HGCD2_DIV1_METHOD.
Feel free.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list