hgcd1/2

Niels Möller nisse at lysator.liu.se
Tue Sep 3 07:54:59 UTC 2019


nisse at lysator.liu.se (Niels Möller) writes:

> I think it's promising, and also a bit simpler than the current div1
> code. Some concerns:
>
> 1. I think the approximate q means that we will often reduce from a > b
>    to a close to but larger than b, so we need two iterations to get to
>    a < b. Might improve progress to add some kind of adjustment step
>    after a - q*b, checking either if result >= b or < 0.
>
> 2. There are different options for the UNLIKELY branch. One could count
>    leading zeros of b, shift, and look up a quotient, and apply
>
>    a -= q * b << k;
>
>    for suitable q and k. Likely cheaper than a division, but we'll then
>    do multiple iterations if size difference is large (which I think is
>    ok).
>
> 3. We get poor accuracy for q when b_hi == 1. One could make that less
>    likely by using a somewhat asymmetric table, say looking up 3 bits of
>    a but 6 bits of b.
>
> 4. One could also go in the other direction and do something closer to
>    left-to-right binary, and in each step always use q = 2^k for
>    suitable k, based on count_leading_zeros on both a and b.

I've tried replacing the single-precision loop in hgcd2 with this loop.
And it does give a speedup of about 5% in the Lehmer range. Patch below.

Cycle numbers on my laptop, speed -p 100000 -c -s 10-100 -f 1.1 mpn_gcd:

Before:
              mpn_gcd
10           21505.73
11           23976.73
12           26443.39
13           28978.88
14           31538.00
15           33956.00
16           36556.81
17           39220.50
18           42087.00
19           45242.36
20           47446.70
22           53891.50
24           59186.14
26           64720.50
28           70047.80
30           75426.00
33           83215.33
36           92378.00
39          101442.50
42          110168.50
46          118957.00
50          132620.00
55          147323.00
60          166901.00
66          188441.00
72          208773.00
79          236515.00
86          262802.00
94          295412.00

After:

              mpn_gcd
10           20174.23
11           22459.27
12           24851.14
13           27303.50
14           29583.15
15           31994.11
16           34313.12
17           36977.57
18           39485.92
19           42317.91
20           44573.10
22           50640.62
24           55728.29
26           60987.33
28           66027.00
30           71474.00
33           79510.00
36           87607.00
39           97009.50
42          105033.50
46          116175.00
50          129649.00
55          141719.00
60          161759.00
66          179491.00
72          200928.00
79          226226.00
86          251570.00
94          283439.00

And this while leaving the more expensive double-limb loop unchanged.

But... I get even better numbers if I keep the old code and just replace
the div1 function with plain division q = a / b:

              mpn_gcd
10           18757.85   # 15% speedup
11           20862.24
12           22963.11
13           25123.29
14           27388.40
15           29541.06
16           31587.19
17           34132.93
18           36209.33
19           39231.64
20           42615.30
22           47155.88
24           51235.71
26           56656.17
28           61496.60
30           65850.00
33           72661.33
36           81102.67
39           89463.00
42           97084.00
46          106561.00
50          117578.00
55          132398.00
60          149708.00
66          168662.00
72          188981.00
79          211562.00
86          236054.00
94          268411.00   # 10% speedup

And this on a laptop with an Intel U4100 (5 years old?), so I'd assume
it doesn't have a particularly fast div instruction. Should we just
delete div1 ? On which architectures can we expect it to be beneficial?
It should be fairly easy to find out, if we define a HGCD_DIV1_METHOD
known to tuneup, to select between plain division and the div1 function.

The case for div2 may be different.

Regards,
/Niels

-------------- next part --------------
A non-text attachment was scrubbed...
Name: hgcd2.c.patch
Type: text/x-diff
Size: 8374 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20190903/d88a7058/attachment.bin>
-------------- next part --------------

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list