GMP 4.3 multiplication performance

Niels Möller nisse at lysator.liu.se
Thu Jun 4 09:27:03 CEST 2009


Torbjorn Granlund <tg at gmplib.org> writes:

> nisse at lysator.liu.se (Niels Möller) writes:

>   Talking about code cleanup, I think there's a lot of duplication in
>   the toom evaluation code. Would the overhead be aceptable if one
>   writes a function for, e.g., evaluating a degree four polynomial in
>   the points +1 and -1? [...]

> In what way would that help?

Well, in the toom44_mul I've been hacking on, I have the block

    /* Compute apx = a0 + a1 + a2 + a3 and amx = a0 - a1 + a2 - a3.  */
    apx[n] = mpn_add_n (apx, a0, a2, n);
    tp[n] = mpn_add (tp, a1, n, a3, s);
  
  #if HAVE_NATIVE_mpn_addsub_n
    if (mpn_cmp (apx, tp, n + 1) < 0)
      {
        mpn_addsub_n (apx, amx, tp, apx, n + 1);
        flags = toom4_w3_neg;
      }
    else
      {
        mpn_addsub_n (apx, amx, apx, tp, n + 1);
        flags = 0;
      }
  #else
    if (mpn_cmp (apx, tp, n + 1) < 0)
      {
        mpn_sub_n (amx, tp, apx, n + 1);
        flags = toom4_w3_neg;
      }
    else
      {
        mpn_sub_n (amx, apx, tp, n + 1);
        flags = 0;
      }
  
    mpn_add_n (apx, apx, tp, n + 1);
  #endif
  
    ASSERT (apx[n] <= 3);
    ASSERT (amx[n] <= 1);

This is followed by an identical block evaluating b, with different
(and independent) inputs and outputs. I naturally programmed this by
copy and paste, folllowed by search and replace of the parameter
names, which makes me feel somewhat dirty. It would be clearer to make
this body of code into a function, called like

  flags = 0;
  if (toom_evaluate_4dgr_pm1(apx, apm, ap, n, s, tp)
    flags |= toom_w3_neg;
  if (toom_evaluate_4dgr_pm1(bpx, bpm, bp, n, t, tp)
    flags ^= toom_w3_neg;

It's better to write and optimize the code in one place. It makes it
more painful to improve the code if every improvement must be applied
to three different places with slightly different variable names and
style.

Above, there are just two uses of the function, which maybe makes it a
borderline case. But other toom4X and toomX4 variants also do the same
evaluation in these two points (currently, toom42 is the only other
use, but more are on the way. And the toom42 version actually uses
more scratch space than it should, which kind-of proves that this is a
nontrivial function).

/Niels


More information about the gmp-devel mailing list