subquadratic mpn_hgcd_appr (was: Re: Possible gcd/hgcd/gcdext improvements)
nisse at lysator.liu.se
Mon Oct 10 22:35:20 CEST 2011
Now I've written a subquadratid divide-and-conquer variant. The most
interesting new function may be hgcd_appr.c:hgcd_matrix_apply, which
does the cancellation thing using mpn_mulmod_bnm1.
Currently, mpn_hgcd recurses to mpn_hgcd, based on HGCD_THRESHOLD. And
mpn_hgcd_appr recurses to mpn_hgcd_appr, based on HGCD_APPR_THRESHOLD.
Which may seem nice and natural, but most likely not optimal.
Consider the first recursive call in each of the two functions:
nn = mpn_hgcd (ap + p, bp + p, n - p, M, tp);
if (nn > 0)
n = mpn_hgcd_matrix_adjust (M, p + nn, ap, bp, p, tp);
success = 1;
MPN_COPY (tp, ap + p, n - p);
MPN_COPY (tp + n - p, bp + p, n - p);
if (mpn_hgcd_appr (tp, tp + n - p, n - p, M, tp + 2*(n-p)))
n = hgcd_matrix_apply (M, ap, bp, n);
success = 1;
These two variants are (almost) totally equivalent, the only difference
is that the latter one may stop reducing a few bits earlier. With input
n, we produce a matrix of size n/4, and compute the corresponding
reduced numbers, of size 3n/4.
So one should really benchmark hgcd + hgcd_adjust vs hgcd_appr +
hgcd_matrix_apply, to see which one is fastest, and then *both*
functions should use the fastest variant for each size. The theory is
that hgcd_appr ought to win for large enough sizes.
And then it's not entirely obvious to me in which order this new
threshold, and the other hgcd thresholds, should be tuned. They are all
interdependent. It will be easier if the crossover for hgcd vs hgcd_appr
happens to fall within the basecase range of both hgcd functions, but
I'm not sure that's going to be the case.
With a crude manually set value for HGCD_APPR_THRESHOLD, hgcd_appr seems
to be some 10%-15% faster for large inputs (10000 limbs) (counting just
the hgcd functions, not the combinations with _adjust and __apply).
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel