Toom testing, and unbalanced recursive calls in toom22.

Tue Oct 20 09:46:14 CEST 2009

Torbjorn Granlund <tg at gmplib.org> writes:

> I don't like it because it will add overhead even when mul.c will never
> call toom22 in such a way that it can be true.  Always-true or
> always-false conditions that cannot be folded by the compilers are bad.

I understand the principle, but I'm not sure it matters in this
case...

Doing (1) will be one additional well-predicted branch for one
multiplication, of size between 2*TOOM22_THRESHOLD and
4*TOOM22_THRESHOLD. I'd expect that to be one cycle added to a total
multiplication time of at least a few thousand cycles. If it kicked in
already at size 1*TOOM22_THRESHOLD, I'd be more concerned about the
overhead.

(And if the condition really is "always" false/true of course depends
on how the toom22 interface is specified).

> Let's see what happens: Each recursive toom22 call removes n (the
> balanced coefficient size) from the sizes an and bn.  The oo point will
> have operand sizes an' = an - n and bn' = bn - n.  So an-bn = an'-bn'
> will hold, call this difference d.  The same d will be repeated for any
> number of toom22 recursive calls, but an and bn will decrease.  This
> means that we must not call toom22 with a d such as the ultimate operand
> sizes TOOM22_THRESHOLD+d,TOOM22_THRESHOLD represent a too
> unbalanced multiply.

So for the final recursive call, we have an' = an - n and bn' = an - d
- n, for some n, and we have bn' >= TOOM22_THRESHOLD. We need

  an' + 2 < 2*bn'

which can be rewritten as

  d + 2 < an - n - d = bn'

By this reasoning, the requirement is that d + 2 < TOOM22_THRESHOLD is
sufficient. So the requirements on input sizes to toom22 is then

  an >= bn
  an + 2 < 2*bn
  an - bn + 2 < TOOM22_THRESHOLD 

which can be simplified to

  an >= bn
  an - bn + 2 < MIN(bn, TOOM22_THRESHOLD)

This means that toom22 supports only *slightly* unbalanced inputs.

> Then, for the time being, please call toom22 from the testing code for
> TOOM22_THRESHOLD <= bn <= an <= bn + TOOM22_THRESHOLD - epsilon, or
> something like that.

I'll try the above condition.

>   On the other hand, I don't think it's a big problem if performance is
>   awful for unusual inputs sizes to toom22 (e.g., toom22 for a 200000 x
>   140000 multiply recursing to a 100000x40000 multiply done using
>   schoolbook).
>   
> That multiplication would take several seconds on a high-end machine.

That's a suitable punishment for explicitly calling an unsuitable
internal primitive ;-). But I think it's desirable to return a correct
result even in this case, if one just has enough patience. mpn_mul and
friends should of course never call mpn_toom22_mul with such large
inputs.

Regards,
/Niels