Toom testing, and unbalanced recursive calls in toom22.

Mon Oct 19 20:49:53 CEST 2009

Torbjorn Granlund <tg at gmplib.org> writes:

> nisse at lysator.liu.se (Niels Möller) writes:
>
>   1. Have TOOM22_MUL_MN_REC always use basecase for too unbalanced
>      products (and trust that in practice toom22 will not be called for
>      sizes where this is a poor choice). Also, for size 2n x n+1, the
>      most unbalanced case mpn_toom22_mul can handle, that function is
>      probably not the best choice.
>   
>   2. Have TOOM22_MUL_MN_REC choose between toom22, toom32 and basecase,
>      using some conditions.
>   
>   3. Let TOOM22_MUL_MN_REC use the general mpn_mul function for
>      unbalanced multiplication. This has implications on the use of
>      scratch space (as long as mpn_mul doesn't use an itch/scratch
>      interface, scratch allocated for toom22 will be wasted. And if
>      mpn_mul is eventually itchified, this gives a rather unpleasant
>      circular dependency.

> I don't think toom22 should call mul.c for the point-at-oo evaluation.
> The overhead is simply too large for these small operands.

Makes sense.

> I also don't think (1) is good.

What's the problem with (1)? The change is as follows:

--- a/mpn/generic/toom22_mul.c  Fri Oct 09 08:13:17 2009 +0200
+++ b/mpn/generic/toom22_mul.c  Mon Oct 19 17:20:00 2009 +0200
@@ -60,7 +60,8 @@
 #define TOOM22_MUL_MN_REC(p, a, an, b, bn, ws)                         \
   do {                                                                 \
     if (! MAYBE_mul_toom22                                             \
-       || BELOW_THRESHOLD (bn, MUL_KARATSUBA_THRESHOLD))               \
+       || BELOW_THRESHOLD (bn, MUL_KARATSUBA_THRESHOLD)                \
+       || an + 2 >= 2*bn)                                              \
       mpn_mul_basecase (p, a, an, b, bn);                              \
     else                                                               \
       mpn_toom22_mul (p, a, an, b, bn, ws);                            \

It's one more condition. This condition will be tested at all only if bn is
above MUL_KARATSUBA_THRESHOLD (and MAYBE_mul_toom22 is true), and it will
be true only for unusual calls to toom22, e.g., from test code.

> (2) is my favourite.

Do you have some heuristics that I should try, and which you'd be
more happy with than simply (1)? It's going to be no less overhead
than (1), but if done right it should improve performance.

One possible strategy might be to choose toom22 when 2 an/3 <= bn<=
an, toom32 when an/2 < bn < 2an/3 (then it's a good fit to toom32 and
a poor fit to toom22), and schoolbook when bn < an/2) (avoiding
calling toom32 when the ratio gets too close to 1/3, not sure if
that's good or bad). Makes sense to me, but probably not at all
optimal.

It might even make some sense to recurse to iterated toom32 rather
than schoolbook, for a highly unbalanced multiplication where bn is
still larger than the Karatsuba threshold. But I don't think that's
something I'd like to try at the moment.

(Sorry if it sounds urgent, but I'd really like to check in the new
test code, and this issue is blocking that. So I'd like a solution
that is simple and good enough, which can then be improved later on
when we understand algorithm selection for unbalanced multiplication
better).

> But we may also keep the code like it is, and simply forbid the operands
> that lead to too unbalanced recursive calls.  That's my approach in the
> measurement program generating the graphs at gmplib.org/devel/.

> This isn't too hard to find out in closed form, I suppose.  I was lazy
> in the measurement program, and wrote a recursive predicate.  It should
> be a simple function of MUL_TOOM22_THRESHOLD and (un-vn).

I don't like this approach. Even if toom22 is an internal function,
I'd like to have simple explicit rules for its interface (such as bn
<= an < 2*bn-2 for the current implementation). To have inputs of a
given size work for some values of MUL_TOOM22_THRESHOLD and give
incorrect results for other, otherwise perfectly sane, values of
MUL_TOOM22_THRESHOLD seem like a very poor interface.

On the other hand, I don't think it's a big problem if performance is
awful for unusual inputs sizes to toom22 (e.g., toom22 for a 200000 x
140000 multiply recursing to a 100000x40000 multiply done using
schoolbook).

Regards,
/Niels