Toom testing, and unbalanced recursive calls in toom22.
Niels Möller
nisse at lysator.liu.se
Tue Oct 20 09:46:14 CEST 2009
Torbjorn Granlund <tg at gmplib.org> writes:
> I don't like it because it will add overhead even when mul.c will never
> call toom22 in such a way that it can be true. Always-true or
> always-false conditions that cannot be folded by the compilers are bad.
I understand the principle, but I'm not sure it matters in this
case...
Doing (1) will be one additional well-predicted branch for one
multiplication, of size between 2*TOOM22_THRESHOLD and
4*TOOM22_THRESHOLD. I'd expect that to be one cycle added to a total
multiplication time of at least a few thousand cycles. If it kicked in
already at size 1*TOOM22_THRESHOLD, I'd be more concerned about the
overhead.
(And if the condition really is "always" false/true of course depends
on how the toom22 interface is specified).
> Let's see what happens: Each recursive toom22 call removes n (the
> balanced coefficient size) from the sizes an and bn. The oo point will
> have operand sizes an' = an - n and bn' = bn - n. So an-bn = an'-bn'
> will hold, call this difference d. The same d will be repeated for any
> number of toom22 recursive calls, but an and bn will decrease. This
> means that we must not call toom22 with a d such as the ultimate operand
> sizes TOOM22_THRESHOLD+d,TOOM22_THRESHOLD represent a too
> unbalanced multiply.
So for the final recursive call, we have an' = an - n and bn' = an - d
- n, for some n, and we have bn' >= TOOM22_THRESHOLD. We need
an' + 2 < 2*bn'
which can be rewritten as
d + 2 < an - n - d = bn'
By this reasoning, the requirement is that d + 2 < TOOM22_THRESHOLD is
sufficient. So the requirements on input sizes to toom22 is then
an >= bn
an + 2 < 2*bn
an - bn + 2 < TOOM22_THRESHOLD
which can be simplified to
an >= bn
an - bn + 2 < MIN(bn, TOOM22_THRESHOLD)
This means that toom22 supports only *slightly* unbalanced inputs.
> Then, for the time being, please call toom22 from the testing code for
> TOOM22_THRESHOLD <= bn <= an <= bn + TOOM22_THRESHOLD - epsilon, or
> something like that.
I'll try the above condition.
> On the other hand, I don't think it's a big problem if performance is
> awful for unusual inputs sizes to toom22 (e.g., toom22 for a 200000 x
> 140000 multiply recursing to a 100000x40000 multiply done using
> schoolbook).
>
> That multiplication would take several seconds on a high-end machine.
That's a suitable punishment for explicitly calling an unsuitable
internal primitive ;-). But I think it's desirable to return a correct
result even in this case, if one just has enough patience. mpn_mul and
friends should of course never call mpn_toom22_mul with such large
inputs.
Regards,
/Niels
More information about the gmp-devel
mailing list