Toom testing, and unbalanced recursive calls in toom22.

Tue Oct 20 12:42:31 CEST 2009

nisse at lysator.liu.se (Niels Möller) writes:

  Doing (1) will be one additional well-predicted branch for one
  multiplication, of size between 2*TOOM22_THRESHOLD and
  4*TOOM22_THRESHOLD. I'd expect that to be one cycle added to a total
  multiplication time of at least a few thousand cycles. If it kicked in
  already at size 1*TOOM22_THRESHOLD, I'd be more concerned about the
  overhead.

I agree that such a multiplication will take a few thousand cycles.  I
don't agree that this extra test just adds one cycle.  It will cost a
few cycles.

I would agree adding a few cycles to a thousand cycle operation if I
thought it was motivated.  But hunting down wasted cycles is what GMP is
about.  Or to quote Monty Python: "Every cycles is sacred; every cycle
is great; if a cycle cycle is wasted; gmp gets quite irate"

  So for the final recursive call, we have an' = an - n and bn' = an - d
  - n, for some n, and we have bn' >= TOOM22_THRESHOLD. We need

    an' + 2 < 2*bn'

  which can be rewritten as

    d + 2 < an - n - d = bn'

  By this reasoning, the requirement is that d + 2 < TOOM22_THRESHOLD is
  sufficient. So the requirements on input sizes to toom22 is then

    an >= bn
    an + 2 < 2*bn
    an - bn + 2 < TOOM22_THRESHOLD 

  which can be simplified to

    an >= bn
    an - bn + 2 < MIN(bn, TOOM22_THRESHOLD)

  This means that toom22 supports only *slightly* unbalanced inputs.

It might appear so, but since toom22 is only meant to operate between
TOOM22_THRESHOLD and TOOM33_THRESHOLD, it is not that bad.

Do you agree that if that if we need toom22 to allow further unbalanced
operands, we should not call mul_basecase for known-suboptimal operands,
but instead call toom32 for these operands?

(We don't need to generalise toom22 that way if the 2nd best function
for the area which is the target for such generalisations is really as
fast at toom22 there.)

  > Then, for the time being, please call toom22 from the testing code for
  > TOOM22_THRESHOLD <= bn <= an <= bn + TOOM22_THRESHOLD - epsilon, or
  > something like that.

  I'll try the above condition.

Thanks.

  >   On the other hand, I don't think it's a big problem if performance is
  >   awful for unusual inputs sizes to toom22 (e.g., toom22 for a 200000 x
  >   140000 multiply recursing to a 100000x40000 multiply done using
  >   schoolbook).
  >   
  > That multiplication would take several seconds on a high-end machine.

  But I think it's desirable to return a correct result even in this
  case, if one just has enough patience.  

As a design principle, I really disagree.  The mpn functions are
(implicitly) layered.  Low-level functions will be raw, and should
really not try to remedy situations such as out-of-range operands.  This
does not mean that we shouldn't try to make low-level functions as
general as possible, if it can be done without extra code or without
extra overhead.

In the case of toom22, you made us understand that the allowed operands
were not actually defined, and our usage was potentially outside of the
allowable range.

-- 
Torbjörn