hgcd1/2
Niels Möller
nisse at lysator.liu.se
Mon Sep 9 20:39:57 UTC 2019
tg at gmplib.org (Torbjörn Granlund) writes:
> I believe METHOD 1 could almost always be pointless if METHOD 3's
> straight line code is made really efficient.
I'd guess so too. But benchmarks are better than faith alone ;-)
> That should in particular be true if the q = 1 code is removed from
> hgcd2, for machines with good multiplication throughput.
Can we remove that for everything but HGCD2_METHOD == 2? It would be
nice to not have an explosion of cases.
> The normalisation code only works if operands are not already
> normalised. Cannot they be that?
We call the normalizing div2 only when q > bh, which implies bh^2 < ah.
Then bh can't be normalized. When debugging (on 64-bit), I only saw this
code entered with size of bh exactly 32 bits, ah exactly 64 bits. But I
think it might be called with bh even smaller.
> Should be revive the commented-out div2? It looks good to me, the
> comment about that it is slower than the branchy div2 is probably from
> the era where branchy div1/div2 was great.
I haven't had the time to benchmark it, but I wouldn't be surprised if
it runs better than the other div2.
> (I suspect that to be when Vax and m68020 was hot.)
IIRC, the alternative div2 was written 2008, and benchmarked on AMD
Opteron (or at most a few years older). One or both of div1 and the
"regular" div2 were copied from much older code.
> I think there might be a slight bug or inefficiency here (from hgcd2.c):
>
> ASSERT (ah >= bh);
>
> ah -= bh;
> if (ah < (CNST_LIMB (1) << (GMP_LIMB_BITS / 2 + 1)))
> break;
>
> if (ah <= bh)
> {
> /* Use q = 1 */
> u01 += u00;
> u11 += u10;
> }
> else
> {
> mp_double_limb_t rq = div1 (ah, bh);
>
> If, in the the last 'if' above ah = bh, we have as a matter of fact q =
> 2 (with zero remainder).
The thing is, we don't want q = 2 and one number reduced to a zero. We
want to terminate with a matrix corresponding to both a and b roughly
half a limb (full limb, if it weren't for the truncation when going from
double precision to single precision), and |a - b| smaller.
If we made the change from "<=" to "<", the ah == bh case would continue to
mp_double_limb_t rq = div1 (ah, bh);
q <-- 1 mp_limb_t q = rq.d1;
ah <-- 0 ah = rq.d0;
cond true if (ah < (CNST_LIMB(1) << (GMP_LIMB_BITS / 2 + 1)))
{
/* A is too small, but q is correct. */
u01 += q * u00;
u11 += q * u10;
break;
}
so we'll do the same matrix update, u01 += u00; u11 += u10; and exit the
loop.
If we keep "<=" as is, we instead update u01, u11 and continue to the
next iteration:
bh <-- 0 bh -= ah;
cond true if (bh < (CNST_LIMB (1) << (GMP_LIMB_BITS / 2 + 1)))
break;
and exit the loop. So we'll exit the loop correctly either way, but
using "<=" avoids the div1 call and a few multiplications just before
exit.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list