Toom testing, and unbalanced recursive calls in toom22.

Mon Oct 19 23:39:35 CEST 2009

nisse at lysator.liu.se (Niels Möller) writes:

  Torbjorn Granlund <tg at gmplib.org> writes:

  > nisse at lysator.liu.se (Niels Möller) writes:
  >
  >   1. Have TOOM22_MUL_MN_REC always use basecase for too unbalanced
  >      products (and trust that in practice toom22 will not be called for
  >      sizes where this is a poor choice). Also, for size 2n x n+1, the
  >      most unbalanced case mpn_toom22_mul can handle, that function is
  >      probably not the best choice.
  >   
  >   2. Have TOOM22_MUL_MN_REC choose between toom22, toom32 and basecase,
  >      using some conditions.
  >   
  >   3. Let TOOM22_MUL_MN_REC use the general mpn_mul function for
  >      unbalanced multiplication. This has implications on the use of
  >      scratch space (as long as mpn_mul doesn't use an itch/scratch
  >      interface, scratch allocated for toom22 will be wasted. And if
  >      mpn_mul is eventually itchified, this gives a rather unpleasant
  >      circular dependency.

  > I don't think toom22 should call mul.c for the point-at-oo evaluation.
  > The overhead is simply too large for these small operands.

  Makes sense.

  > I also don't think (1) is good.

  What's the problem with (1)? The change is as follows:

  --- a/mpn/generic/toom22_mul.c  Fri Oct 09 08:13:17 2009 +0200
  +++ b/mpn/generic/toom22_mul.c  Mon Oct 19 17:20:00 2009 +0200
  @@ -60,7 +60,8 @@
   #define TOOM22_MUL_MN_REC(p, a, an, b, bn, ws)                         \
     do {                                                                 \
       if (! MAYBE_mul_toom22                                             \
  -       || BELOW_THRESHOLD (bn, MUL_KARATSUBA_THRESHOLD))               \
  +       || BELOW_THRESHOLD (bn, MUL_KARATSUBA_THRESHOLD)                \
  +       || an + 2 >= 2*bn)                                              \
         mpn_mul_basecase (p, a, an, b, bn);                              \
       else                                                               \
         mpn_toom22_mul (p, a, an, b, bn, ws);                            \

  It's one more condition. This condition will be tested at all only if bn is
  above MUL_KARATSUBA_THRESHOLD (and MAYBE_mul_toom22 is true), and it will
  be true only for unusual calls to toom22, e.g., from test code.

I don't like it because it will add overhead even when mul.c will never
call toom22 in such a way that it can be true.  Always-true or
always-false conditions that cannot be folded by the compilers are bad.

If testing code needs a condition inserted into critical library code in
order to avoid triggering problems that cannot happen with GMP's own
usage, then the testing code ought to be improved.

Now, of course we need to be quite sure that GMP does indeed not call
toom22 in a way that could create these very unbalanced operands.  I
think the current GMP is *not* correctly written to make all
TOOM22_THRESHOLD and TOOM33_THRESHOLD values safe.

Let's see what happens: Each recursive toom22 call removes n (the
balanced coefficient size) from the sizes an and bn.  The oo point will
have operand sizes an' = an - n and bn' = bn - n.  So an-bn = an'-bn'
will hold, call this difference d.  The same d will be repeated for any
number of toom22 recursive calls, but an and bn will decrease.  This
means that we must not call toom22 with a d such as the ultimate operand
sizes TOOM22_THRESHOLD+d,TOOM22_THRESHOLD represent a too
unbalanced multiply.

Our current condition in mul.c therefore isn't right.  (But this doesn't
mean there is a real problem with any existing config.)

We do toom32 until 9*an > 10*bn, i.e. until d = 9*an-10*bn.  This can be
arbitrarily large.  Fortunately, it is bounded by TOOM33_THRESHOLD,
as mul.c is written.

So, doesn't this reasoning mean that your additional condition to call
mpn_mul_basecase is needed?  No, it does not mean that, I think.

Calling mul_basecase when we know mul_basecase isn't the right primitive
is not a good idea.  Either we should have called toom22 with these
operands, or toom22 will need to call toom32.  Agree?

But perhaps we should change mul.c to not potentially invoke toom22 in a
bad way.  We need to add a condition based on
TOOM22_THRESHOLD+d,TOOM22_THRESHOLD.  (However, if 2*TOOM22_THRESHOLD >=
TOOM33_THRESHOLD we could take more freedoms, since toom22 then will
never actually recurse.)

Is my reasoning right?

  (Sorry if it sounds urgent, but I'd really like to check in the new
  test code, and this issue is blocking that. So I'd like a solution
  that is simple and good enough, which can then be improved later on
  when we understand algorithm selection for unbalanced multiplication
  better).

Then, for the time being, please call toom22 from the testing code for
TOOM22_THRESHOLD <= bn <= an <= bn + TOOM22_THRESHOLD - epsilon, or
something like that.

  I don't like this approach. Even if toom22 is an internal function,
  I'd like to have simple explicit rules for its interface (such as bn
  <= an < 2*bn-2 for the current implementation). To have inputs of a
  given size work for some values of MUL_TOOM22_THRESHOLD and give
  incorrect results for other, otherwise perfectly sane, values of
  MUL_TOOM22_THRESHOLD seem like a very poor interface.

Yes, perhaps it is not pretty, but I think we need to disregard the lack
of beauty for critical low-lever interfaces.  :-)

  On the other hand, I don't think it's a big problem if performance is
  awful for unusual inputs sizes to toom22 (e.g., toom22 for a 200000 x
  140000 multiply recursing to a 100000x40000 multiply done using
  schoolbook).

That multiplication would take several seconds on a high-end machine.

-- 
Torbjörn