# hgcd1/2

Niels Möller nisse at lysator.liu.se
Sat Sep 7 22:17:13 UTC 2019

```tg at gmplib.org (Torbjörn Granlund) writes:

> Should we rename div1 to put it into the GMP name space?  How about
> mpn_div_11?  (We could encode in its name that it is for use for small
> q, but I'd suggest that we don't do that.)
>
> That would allow for some assembly experiments.

If you think assembly will make a big difference (which seems likely),
that makes sense. And define it only when it's faster than a a div
instruction generated by the compiler?

>   Taking out powers of two makes things complicated. Taking out a factor
>   of two from one of the numbers corresponds to a matrix (1,0;0,2) with
>   determinant 2. So the inverse is not an integer matrix, but an integer
>   matrix + a shift count.
>
> Would a shift count hurt?

On second thought, I think there's a bigger problem: The way hgcd is
used, it's called on the most significant part of some numbers, but
applied to larger numbers. And a matrix based on trailing zeros in hgcd
won't be applicable to the larger numbers, since they have different
lowend bits.

>   It would be interesting to try some left-to-right binary hgcd. Logic
>   should be something like
>
>
>     if (a_bits == b_bits)
>       (a, b) <-- (min(a,b), |a-b|)
>     else if a_bits > b_bits
>       a <-- |a - b * 2^{a_bits - b_bits}|
>     else
>       b <-- |b - a * 2^{b_bits - a_bits}|
>
> OK!  This is super interesting!

I think it may work nicely on machines with slow multiplication as well
as small-quotient division. But will be some overhead to make
branch-free. I haven't tried it seriously yet.

We'll sometimes get determinant == -1, so users of hgcd2 would need to
be updated to accept that. Or maybe one can just do a swap to ensure
that we have det == +1 in all cases. E.g.,

a <-- 2b - a

corresponds to the matrix (-1, 2; 0,1) with determinant -1. But if we
swap a and b, and set

(a, b) <-- (b, 2b - a)

that's the matrix (0,1; -1, 2) with determinant +1.

In the mean time, I've updated the replacement of div2. See attached
patch, using div1 except when HGCD2_METHOD == 2. Timing on my old laptop:

\$ ./tune/speed -c -p1000000 -s1 mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3
overhead 6.00 cycles, precision 1000000 units of 8.33e-10 secs, CPU freq 1200.00 MHz
mpn_hgcd2_1   mpn_hgcd2_2   mpn_hgcd2_3
1             1492.62       2064.18      #1375.18

So this is 50% speedup over the old method.

Regards,
/Niels

-------------- next part --------------
A non-text attachment was scrubbed...
Name: hgcd2-div2.patch
Type: text/x-diff
Size: 3089 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20190908/ee1ebce4/attachment.bin>
-------------- next part --------------

--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
```