GMP 4.3 multiplication performance
Niels Möller
nisse at lysator.liu.se
Thu Jun 4 09:27:03 CEST 2009
Torbjorn Granlund <tg at gmplib.org> writes:
> nisse at lysator.liu.se (Niels Möller) writes:
> Talking about code cleanup, I think there's a lot of duplication in
> the toom evaluation code. Would the overhead be aceptable if one
> writes a function for, e.g., evaluating a degree four polynomial in
> the points +1 and -1? [...]
> In what way would that help?
Well, in the toom44_mul I've been hacking on, I have the block
/* Compute apx = a0 + a1 + a2 + a3 and amx = a0 - a1 + a2 - a3. */
apx[n] = mpn_add_n (apx, a0, a2, n);
tp[n] = mpn_add (tp, a1, n, a3, s);
#if HAVE_NATIVE_mpn_addsub_n
if (mpn_cmp (apx, tp, n + 1) < 0)
{
mpn_addsub_n (apx, amx, tp, apx, n + 1);
flags = toom4_w3_neg;
}
else
{
mpn_addsub_n (apx, amx, apx, tp, n + 1);
flags = 0;
}
#else
if (mpn_cmp (apx, tp, n + 1) < 0)
{
mpn_sub_n (amx, tp, apx, n + 1);
flags = toom4_w3_neg;
}
else
{
mpn_sub_n (amx, apx, tp, n + 1);
flags = 0;
}
mpn_add_n (apx, apx, tp, n + 1);
#endif
ASSERT (apx[n] <= 3);
ASSERT (amx[n] <= 1);
This is followed by an identical block evaluating b, with different
(and independent) inputs and outputs. I naturally programmed this by
copy and paste, folllowed by search and replace of the parameter
names, which makes me feel somewhat dirty. It would be clearer to make
this body of code into a function, called like
flags = 0;
if (toom_evaluate_4dgr_pm1(apx, apm, ap, n, s, tp)
flags |= toom_w3_neg;
if (toom_evaluate_4dgr_pm1(bpx, bpm, bp, n, t, tp)
flags ^= toom_w3_neg;
It's better to write and optimize the code in one place. It makes it
more painful to improve the code if every improvement must be applied
to three different places with slightly different variable names and
style.
Above, there are just two uses of the function, which maybe makes it a
borderline case. But other toom4X and toomX4 variants also do the same
evaluation in these two points (currently, toom42 is the only other
use, but more are on the way. And the toom42 version actually uses
more scratch space than it should, which kind-of proves that this is a
nontrivial function).
/Niels
More information about the gmp-devel
mailing list