# hgcd1/2

Niels Möller nisse at lysator.liu.se
Sun Sep 15 17:46:47 UTC 2019

```tg at gmplib.org (Torbjörn Granlund) writes:

> The distribution of operands are the same as for a plain, old Euclid,
> isn't it?

Should be. Or slightly different, if q = 1 is handeled specially.

> So one could perhaps provide a set of uniformly distribted
> random numbers for div1 and get adequate results?

Are you saying that the quotient of two independently uniformly
distributed numbers has a similar distribution as the euclidean quotient
sequence? Otherwise, one could tweak the distribution in some way.

>   I'm appending another iteration of the patch to add div2 function based
>   on div1 on the high limbs. Selected via HGCD2_DIV2_METHOD. Benchmarks:
>
>   HGCD2_DIV2_METHOD mpn_hgcd2_1   mpn_hgcd2_2   mpn_hgcd2_3
>                   1    #1504.47       1740.53       1513.56  div1 + unlikely case
>                   2    #1739.66       1858.01       1753.73  current code
>                   3    #1615.32       1736.69       1621.95  the if:ed out version
>
> Looks good.  Ready for repo?

I think it's quite ready, but not today. I also wonder if we can delete
the "current" code right away, to reduce number ov variants. Or if we
should wire it all up for tuning before deciding that.

> What is the current reasoning on suppressing the q = 1 code in the hgcd2
> code calling div1 and div2?

I don't quite remember, but the operations in the q == 1 case are

sub_ddmmss      (or just ah -= bh in the single-limb loop)
if (ah <= bh)   True 41.5%, if I find the right place in Knuth
u01 += u00;
u11 += u10;

So that's pretty cheap, except for the branch each mispredictions can
cost several cycles.

The case of general q obviously needs to compute a quotient, where div1
gives a candidate quotient that's usually correct. Then the update needs
two multiplies,

u01 += q * u00;
u11 += q * u10;

In addition, the computation of the remainder is not not so cheap, at
least not for the double-limb loop, where we get one or two additilan
multiplies, some carry propagation, and logic for adjusting the div1
quotient when it happens to be one off.

My guess is that special q=1 is often beneficial for the double-limb
loop, and rarely beneficial in the single-limb loop. But needs some
measurements.

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
```