What's a reasonable size ratio for toom32?

Niels Möller nisse at lysator.liu.se
Fri Jul 21 23:13:47 CEST 2023


Hi, I had a look at the unbalanced multiply in mpn_toom32_mul. Currently,
it calls mpn_mul_n for the balanced multiplies, and mpn_mul for the
unbalanced multiplies. I'm thinking that it probably should never call
anything but toom22, toom32 and basecase. 

For the balanced case, the reason is that for sizes that want toom33,
one should probably use higher toom than toom32 (but I'm not really
convinced, maybe toom33 could also make sense here).

The unbalanced case is more interesting. For toom22, we know that the
top part gets more and more unbalanced as recursion progresses, so it
currently may call toom32 (and it's not entirely to why it never gets
too unbalanced for toom32).

But for toom32, the top product does *not* necessarily get more and more
balanced as we recurse down. That was a new revelation to me. When
input limb sizes are n, m, n >= m, define the ratio as r = m / n,
similar to comments in mul.c. 

For toom32, we aim for ratio 2/3, but can allow input ratio in the range
1/3 < r < 1. Let r' be the size ratio for the top recursive multiply.
It's clear that if we, e.g., have r = 2/3, then r' = 1, so the product is
perfectly balanced. If I get the equations right we have is

  1/3 < r <= 2/3 ==> r' = 3r - 1
  2/3 <= r < 1   ==> r' = 2/r - 2

If we want to stick to calling toom22 or toom32, then we need to
restrict input ratio to some range r_1 <= r <= r_2 such that we always
have r' >= r_1. And then do the recursive multiply using toom32 if r' <=
r_2, otherwise toom22 (and still fall back to schoolbook below some
threshold, of course).

It turns out r = 1/2 is a fix point, i.e., r = 1/2 ==> r' = 1/2, so set
r_1 = 1/2, r_2 = 4/5. Then both endpoints give r' = 1/2.

This already works fine the way toom22 is called from the mpn_mul_n: It
is called only with r >= 4/5. For toom22, ratio becomes more unbalanced,
and top multiply then gets a ration r' = 2r - 1 >= 3/5. The logic
deciding whether to call toom22 or toom32 for the top product keeps
ratio in range for respective function. (If toom22 is called with far
more unbalanced arguments, I suspect one might end up with an invalid
call to toom32, though).

However, the way toom32 is called from top-level mpn_mul doesn't stick
to the range defined above, it calls toom32 with ratio in the range 4/7
< r < 6/7. The lower bound is fine, since 4/7 > 1/2. But the range between 4/5
= 0.8 and 6/7 \approx 0.857 will get toom32 to produce more and more
unbalanced arguments as it recurses.

Bottom line: If mpn_mul is fixed to not call toom32 in the above range
(toom22 could be a reasonable replacement), and stricter ranges for
toom22 and toom32 are properly documented, then we can change toom32 to
not recurse using the top-level mpn_mul. And it should also get a lot
easier to adapt toom22 and toom32 to the itch/scratch convention, if we
want to go that way.

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.



More information about the gmp-devel mailing list